next up previous
Next: How lexers (e.g. from Up: CS2121: The Implementation and Previous: YACC

Subsections


YACC: Further usage

YACC: Further usage


Grammar rules

sequence


thing1 thing2 thing3 . . .

invoke


name things name : things ;
%%  
...{name}... ... name ...
LEX (may not recurse) YACC (may recurse)
..

choice


name : thing1 $\vert$ thing2 ; name : thing1 ;
  name : thing2 ;

use either form (LEX omits name : and ;)
LEX can also handle single-character alternatives using [ ]

optional


use an empty alternative (or ? with LEX) ..

repetition (1 or more)


(thing)+   name : thing $\vert$ name thing ;
  or name : thing $\vert$ thing name ;
LEX YACC  

optional (0 or more)


(thing)*   name : $\vert$ name thing ;
  or name : $\vert$ thing name ;
LEX YACC  
..

[0-9]+
LEX

number : digit | number digit ;
digit  : '0' | '1' | . . . | '9' ;
YACC

Example of equivalent LEX and YACC grammar rules


Left and right recursion


We will use two different grammars, one right recursive:  exp : id | id '+' exp ;

and one left recursive: exp : id | exp '+' id ;
to recognise a + b + c as a expression.
(Assume that we can recognise and convert a and b and c to identifiers somehow - they are only used here to make the form of the parse trees clear.)

With both grammars, the form that expression nodes of the parse tree can take is limited - either a single identifier or else a + with an identifier on one (specified) side and an expression on the other (i.e. another node). Therefore we have one node for each identifier. Of the three expression nodes this implies, one will be just an identifier, and the other two will also contain the two +s and subexpressions.

Example parse tree and equivalent fully-bracketed expression for left-recursive and right-recursive grammars:

\begin{picture}(13,13)
\put(8,12){exp}
\put(7.8,11.8){\line(-1,-1){2.2}} % /
\pu...
... \put(8,4){b}
\put(0.8,3.8){\line(0,-1){2.2}} % \vert
\put(0,0){a}
\end{picture}


(((a) + b) + c)

left-recursive


\begin{picture}(13,13)
\put(3,12){exp}
\put(2.8,11.8){\line(-1,-1){2.2}} % /
\pu...
...(10,4){exp}
\put(10.8,3.8){\line(0,-1){2.2}} % \vert
\put(10,0){c}
\end{picture}


(a + (b + (c)))

right-recursive

Therefore right recursive grammars should be used for right associative operators, and left recursive grammars for left associative operators.


Getting actions in the right order

e.g. part of a Pascal declaration:

         decl   : idlist ':' typeid ;
         idlist : id | idlist ',' id ;

Each id recognised in idlist should be declared to be of type typeid. e.g. for a , b : real we want to declare a and b to each be real:


\begin{picture}(60,13)
\put(40,12){decl}
\put(39.8,11.8){\line(-1,-1){2.2}} % /
...
...(40,4){'b'}
\put(8.8,3.8){\line(0,-1){2.2}} % \vert
\put(8,0){'a'}
\end{picture}
If we could do this using YACC, we would write something like:

 decl   : idlist ':' typeid {$1 = $3;}
        ;
 idlist : id                {declare ($1, $$);}
        | idlist ',' id     {declare ($1, $$); $3 = $$;}
        ;

BUT WE CAN NOT DO THIS WITH YACC!

However, it is always possible to rewrite the grammar so that YACC can perform the calculations, although it may not always be easy or obvious. For example, the (left-recursive) grammar used above is really something like:

 decl : id ( ',' id )* ':' typeid ;

and can be rewritten as (right-recursive):

 decl   : id tail        {declare ($1, $2);}
        ;
 tail   : ',' id tail    {$$ = $3; declare ($2, $3);}
        | ':' typeid     {$$ = $2;}
        ;

to give the following parse tree:


\begin{picture}(40,13)
\put(10,12){decl \{declare(a,real)\}}
\put(9.8,11.8){\li...
...(21.8,3.8){\line(1,-1){2.2}} %
\put(17,0){':'}
\put(23,0){'real'}
\end{picture}


Precedence and associativity

The example calculators given in $\S $3 give different precedence to the operators +, -, *, /, ( and ) by using a separate grammar rule for each precedence level.

We can use left recursion for left associative operators and right recursion for right associative operators ($\S $4.2). For non-associative operators, we just do not use recursion at all e.g. a == b == c is legal in C but not in Pascal (which also uses = instead of ==) so we could use this grammar rule for Pascal:
condition : exp '=' exp ;

The problem with these methods is that they tend to complicate the grammar and constrain language design. YACC has an alternative mechanism that decouples the declarations of precedence and associativity from the grammar, using %left, %right and %nonassoc. e.g.:

  %nonassoc '='                      |
  %left '+' '-'                      | increasing
  %left '*' '/'                      |
  %right '!'                         | precedence     (i.e. not)
  %right '^'                         v                (i.e. exponentiation)
  %%
  exp   : exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp
        | number | '(' exp ')' | exp '^' exp  | exp '=' exp | '!' exp
        ;

YACC can also give different precedences to overloaded operators; e.g. if - is used both for negation and for subtraction and we want to give them different precedences (and negation the same precedence as for not):

        | '-' exp %prec '!'

To allow for any sort of associativity, we have had to use grammar rules in the example that contain both left and right recursion - if it were not for the %left etc., this grammar would be hopelessly ambiguous and so unusable. (Disambiguation uses precedence first, then associativity, as expected.) The converse of this is, that you must be careful how you use these facilities, as they can conceal unexpected ambiguities in the grammar. ($\S $6 will explore the problems of ambiguous grammars.)

Grammatical errors: handling incorrect inputs

As it stands, parsers generated by YACC halt at the first grammar error. This is not very informative to users, as there may be more errors to be found, so how can we improve error handling?

single-symbol errors
We could explicitly modify the grammar to detect likely errors:

        idlist  : id
                | idlist , id
                | idlist   id  {yyerror ("missing ,");}
                ;

bigger errors
The parser must skip input symbols and exit parse rules until it reaches a recognizable restart point. With YACC, we can explicitly modify the grammar to indicate where to restart parsing:

        statement : variable ':=' expression
                | . . .
                | error
                ;

suggests where error recovery takes place. YACC also has actions yyerrorok, yyclearin, YYACCEPT and YYERROR to assist in this process.

Most modern parser-generators attempt, more or less successfully, to do both these levels of error recovery automatically, so all we have to write is the correct grammar. (Another reason to use a parser-generator, rather than write parsers by hand.)

Epilogue

A tool like YACC can cope with most combinations of precedence, associativity, number of operands and fix in expressions. It can even cope with most forms of overloading. However, YACC would have great difficulty in coping with an overloaded operator that had different precedence, associativity, number of operands or fix depending on its operands. Probably because most humans would also find this hard to cope with, I know of no such language.

Language designers can essentially ignore the difficulty of implementation of expressions and concentrate on usability, as the average user and the average compiler writer would probably agree about features they liked.

Exercises

($\dag $ indicates harder problems)

Readings

Louden: chs. 4.2, 4.4, 4.5
Capon & Jinks: chs. 6, 8.1, 8.4, 8.5, 8.6
Aho, Sethi & Ullman: chs. 2.2, 4.1, 4.3, 4.9, 5.1
Johnson
Levine, Mason & Brown: chs. 3, 7, 9


next up previous
Next: How lexers (e.g. from Up: CS2121: The Implementation and Previous: YACC
Pete Jinks
2004-10-26