Next: Semantic Analysis - Dictionaries Up: ho Previous: Lexical analysis - Lex Contents

Subsections

Syntactic analysis - Yacc & Parse Trees

An infix calculator written using Lex and Yacc

($CS5031/e*/infix2/*)

We normally use Lex and Yacc together; Lex for the simple parts (e.g. numbers, names, operators) and Yacc for more complex parts (e.g. expressions, statements). For this example, we will keep the same operators as the postfix calculator, but we will give them their usual associativity (left), precedence (* and / before + and -) and we will also need brackets.

Yacc part

 %{
 #include <stdio.h>     /* C declarations used in actions */
 %}

 %union {int a_number;}         /* Yacc definitions */
 %start line
 %token <a_number> number
 %type <a_number> exp term factor

 %%

 /* descriptions of expected inputs     corresponding actions (in C) */

 line   : exp ';'               {printf ("result is %d\n", $1);}
        ;
 exp    : term                  {$$ = $1;}
        | exp '+' term          {$$ = $1 + $3;}
        | exp '-' term          {$$ = $1 - $3;}
        ;
 term   : factor                {$$ = $1;}
        | term '*' factor       {$$ = $1 * $3;}
        | term '/' factor       {$$ = $1 / $3;}
        ;
 factor : number                {$$ = $1;}
        | '(' exp ')'           {$$ = $2;}
        ;

 %%                     /* C code */

 int main (void) {return yyparse ( );}

 void yyerror (char *s) {fprintf (stderr, "%s\n", s);}

Lex part

 %{
 #include "y.tab.h"
 %}
 %%

 [0-9]+                 {yylval.a_number = atoi(yytext); return number;}
 [ \t\n]                ;
 [-+*/( );]             {return yytext[0];}
 .                      {ECHO; yyerror ("unexpected character");}

 %%
 int yywrap (void) {return 1;}

input descriptions

The format of the grammar rules for Yacc is:

name    : names and 'single character's
        | alternatives
        ;

Yacc definitions

%start line means the whole input should match line

%union lists all possible types for values associated with parts of the grammar and gives each a field-name

%type gives an individual type for the values associated with each part of the grammar,

using the field-names from the %union declaration

%token declare each grammar rule used by YACC that is recognised by LEX and give type of value

Actions, C declarations & code:

$$ resulting value for any part of the grammar

$1, $2, etc. values from sub-parts of the grammar

yyparse routine created by YACC from (expected input, action) lists.

(It actually returns a value indicating if it failed to recognise the input.)

yylex routine called by yyparse for all its input.

We are using getchar, which just reads characters from the input.

yyerror routine called by yyparse whenever it detects an error in its input.

More Lex declarations and actions

y.tab.h gives LEX the names and type declarations etc. from YACC

yylval name used for values set in LEX e.g.

yylval.a_number = atoi (yytext);

yylval.a_name = findname (yytext);

How Lex and Yacc are used together:

flex : calcl.l $\rightarrow$ calcl.c
gcc : calcl.c $\rightarrow$ calcl.o
byacc : calcy.y $\rightarrow$ calcy.c
gcc : calcy.c $\rightarrow$ calcy.o
ld : calcl.o calcy.o $\rightarrow$ calc

calc : expression $\rightarrow$ result

Precedence and associativity

The example calculator above gives different precedence to the operators +, -, *, /, ( and ) by using a separate grammar rule for each precedence level. The operator associativities are also defined by the grammar rules, as you will investigate in the examples sheet.

The problem with these methods is that they tend to complicate the grammar and constrain language design. Yacc has an alternative mechanism that decouples the declarations of precedence and associativity from the grammar, using "%left", "%right" and "%nonassoc". e.g.:

  %nonassoc '='                  v
  %left '+' '-'                  v increasing
  %left '*' '/'                  v
  %right not                     v precedence
  %right '^'                     v              (i.e. exponentiation)
%%
  exp   : exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp
        | number | '(' exp ')' | exp '^' exp  | exp '=' exp | not exp
        ;

Yacc can also give different precedences to overloaded operators; e.g. if - is used both for negation and for subtraction and we want to give them different precedences (and negation the same precedence as for not):

        | '-' exp %prec not

Yacc has other facilities, but those described above are among the most important. You can refer to the Yacc manual in the departmental library or at URL
http://www.cs.man.ac.uk/~pjj/cs2111/yacc/yacc.html
if necessary.

Exercises: Grammars for Expressions

In ANSI-C, assuming a=1 and b=2, what are the values of a, b and c after each of the following, and explain why:
```
a) a++, b++, c=a+b;
b) c= a+ ++b;
c) c= a++ +b;
d) c= a+++b;
e) c= a++ + ++b;
f) c= a+++++b;
```
In ANSI-C, what is the parse tree for, and value of, each of:
```
a) 1 + (2 + 3);
b) (1 + 2) + 3;
c) 1 + 2 + 3;
```
In ANSI-C, what is the parse tree for, and value of, each of:
```
a) 1 - (2 - 3);
b) (1 - 2) - 3;
c) 1 - 2 - 3;
```
Assuming that ^ means raising to a power (e.g. 3^4 = 3*3*3*3 = 81 = 0x51), what is the parse tree for, and hexadecimal value of, each of:
```
a) 2 ^ (3 ^ 3)
b) (2 ^ 3) ^ 3
c) 2 ^ (3 * 3)
d) 2 ^ 3 ^ 3
```

In ANSI-C, what is the parse tree for, and value of, each of:

a) (1 + 2) * (3 + 4);
b) 1 + (2 * 3) + 4;
c) ((1 + 2) * 3) + 4;
d) 1 + 2 * 3 + 4;

Assuming an input expression "1-2-3", what parse tree is produced by each of the following Yacc grammars:

a) exp : number | number '-' number ;
b) exp : number | exp '-' number ;
c) exp : number | number '-' exp ;

Assuming an input expression "1+2*3+4", what parse tree is produced by each of the following Yacc grammars, assuming number is as defined in the example in the lectures:

a) exp : number | number '+' number | number '*' number ;
b) exp : term | exp '+' term ;
   term : number | term '*' number ;
c) exp : term | exp '*' term ;
   term : number | term '+' number ;
d) exp : number | exp '+' number | exp '*' number ;
e) exp : number | exp '+' exp | exp '*' exp ;

We used this lex pattern to recognise numbers: [0-9]+
What is the equivalent BNF, if we use yacc to recognise numbers?

Next: Semantic Analysis - Dictionaries Up: ho Previous: Lexical analysis - Lex Contents

Pete Jinks 2001-02-21

`%start line`	means the whole input should match `line`
`%union`	lists all possible types for values associated with parts of the grammar and gives each a field-name
`%type`	gives an individual type for the values associated with each part of the grammar,
	using the field-names from the %union declaration
`%token`	declare each grammar rule used by YACC that is recognised by LEX and give type of value

`$$`	resulting value for any part of the grammar
`$1`, `$2`, etc.	values from sub-parts of the grammar
`yyparse`	routine created by YACC from (expected input, action) lists.
	(It actually returns a value indicating if it failed to recognise the input.)
`yylex`	routine called by yyparse for all its input.
	We are using getchar, which just reads characters from the input.
`yyerror`	routine called by yyparse whenever it detects an error in its input.

`y.tab.h`	gives LEX the names and type declarations etc. from YACC
`yylval`	name used for values set in LEX e.g.
	`yylval.a_number = atoi (yytext);`
	`yylval.a_name = findname (yytext);`