Next: YACC: Further usage
Up: CS2111: Design and Implementation
Previous: LEX
Contents
Subsections
YACC
YACC program for an infix calculator
We will keep the same operators as the postfix calculator, but we will give
them their usual associativity (left), precedence (* and / before + and )
and we will also need brackets.
($CS2111/e*/infix1/*
)
%{ /* C declarations used in actions */
#include <stdio.h>
%}
/* yacc definitions */
%union {int a_number;}
%start line
%type <a_number> exp term factor number digit
%%
/*descriptions of expected inputs corresponding actions (in C)*/
line : exp ';' '\n' {printf ("result is %d\n", $1);}
;
exp : term {$$ = $1;}
 exp '+' term {$$ = $1 + $3;}
 exp '' term {$$ = $1  $3;}
;
term : factor {$$ = $1;}
 term '*' factor {$$ = $1 * $3;}
 term '/' factor {$$ = $1 / $3;}
;
factor : number {$$ = $1;}
 '(' exp ')' {$$ = $2;}
;
number : digit {$$ = $1;}
 number digit {$$ = $1*10 + $2;}
;
digit : '0' {$$ = 0;}
 '1' {$$ = 1;}
 '2' {$$ = 2;}
 '3' {$$ = 3;}
 '4' {$$ = 4;}
 '5' {$$ = 5;}
 '6' {$$ = 6;}
 '7' {$$ = 7;}
 '8' {$$ = 8;}
 '9' {$$ = 9;}
;
%% /* C code */
int main (void) {return yyparse ( );}
int yylex (void) {return getchar ( );}
void yyerror (char *s) {fprintf (stderr, "%s\n", s);}
name : names and 'single character's
 alternatives
;
%start line 
means the whole input should match line 
%union 
lists all possible types for values associated
with parts of the grammar and gives each a fieldname 
%type 
gives an individual type for the values associated
with each part of the grammar, 

using the fieldnames from the %union declaration 
<>
$$ 
resulting value for any part of the grammar 
$1 , $2 , etc. 
values from subparts of the grammar 
yyparse 
routine created by YACC from
(expected input, action) lists. 

(It actually returns
a value indicating if it failed to recognise the input.) 
yylex 
routine called by yyparse for all its input. 

We are using getchar, which just reads characters from the input. 
yyerror 
routine called by yyparse whenever it
detects an error in its input. 
<>
BYACC : calc.y calc.c
GCC : calc.c calc
calc : expression result
YACC has other facilities, some of which we will use elsewhere,
but those described above are among the most important. Further details and
examples can be found in the readings (4.10).
Using LEX and YACC together
Unfortunately, YACC cannot represent numbers as [09]+
nor
easily obtain the corresponding value, nor can it easily be used to ignore
white space and comments. Therefore, we need to use both LEX and
YACC together; LEX for the simple parts (e.g. numbers,
white space, comments) and YACC for more complex parts
(e.g. expressions).
YACC code for postfix calculator using LEX and LYACC
($CS2111/e*/infix2/*
):
%{
#include <stdio.h>
%}
%union {int a_number;}
%start line
%token <a_number> number
%type <a_number> exp term factor
%%
line : exp ';' {printf ("result is %d\n", $1);}
;
exp : term {$$ = $1;}
 exp '+' term {$$ = $1 + $3;}
 exp '' term {$$ = $1  $3;}
;
term : factor {$$ = $1;}
 term '*' factor {$$ = $1 * $3;}
 term '/' factor {$$ = $1 / $3;}
;
factor : number {$$ = $1;}
 '(' exp ')' {$$ = $2;}
;
%%
int main (void) {return yyparse ( );}
void yyerror (char *s) {fprintf (stderr, "%s\n", s);}
LEX code for postfix calculator using LEX and YACC
($CS2111/e*/infix2/*
):
%{
#include "y.tab.h"
%}
%%
[09]+ {yylval.a_number = atoi(yytext); return number;}
[ \t\n] ;
[+*/( ); {return yytext[0];}
. {ECHO; yyerror ("unexpected character");}
%%
int yywrap (void) {return 1;}
%token 
declare each grammar rule used by YACC that is
recognised by LEX and give type of value 
<>
y.tab.h 
gives LEX the names and type declarations etc. from YACC 
yylval 
name used for values set in LEX e.g. 

yylval.a_number = atoi (yytext); 

yylval.a_name = findname (yytext); 
<>
FLEX : calcl.l calcl.c
GCC : calcl.c calcl.o
BYACC : calcy.y calcy.c
GCC : calcy.c calcy.o
LD : calcl.o calcy.o calc
calc : expression result
Using a tool like YACC, infix, postfix and prefix expressions are
equally simple to implement  it automatically checks that we have the
correct number and layout of operands. We will see in the next section that
YACC can also cope with precedence and associativity.
However, we must be careful if we are evaluating expressions that contain
lazy operators (2.6) as the example calculators in
3 and 4 above are eager. To deal with lazy
operators, we have to wait until the expression has been completely
recognised before we start to evaluate it.
Typically, we generate parse trees (8) or code
(10) that can be used later to evaluate the expressions.
( indicates harder problems)
 Why do we quote e.g.
'=' '+' '*'
etc. in the descriptions of
expected inputs for YACC, when we don't have to for LEX?
 Although we can use single characters like
'+'
and '*'
in
YACC, we can not use multicharacter strings like 'mod'
.
Extend the calculator in 4.6 to include
multicharacter operators like mod
and div
.
5
 Extend the calculator in 4.6 to include some
mathematical functions e.g.
sqrt ( expression )
.
 Design a grammar to recognise a string of the form
AA...ABB...B
,
i.e. any number of A
s followed by any number of B
s. Would it
be better to use LEX or YACC to recognise it?
Change your grammar to recognise strings with equal numbers of A
s and
B
s  now which is best?
 What if we wanted equal numbers of
A
s, B
s and
C
s?
Readings
Louden: chs. 4.2, 4.6
Johnson
Levine, Mason & Brown: chs. 1, 3, 7
Capon & Jinks: chs. 8.1, 8.4, 8.5
Aho, Sethi & Ullman: chs. 4.9, 2.12.5
Next: YACC: Further usage
Up: CS2111: Design and Implementation
Previous: LEX
Contents
Pete Jinks
19990930