Next: Inside LEX and YACC:
Up: CS2111: Design and Implementation
Previous: YACC
  Contents
Subsections
YACC: Further usage
- How do we know when to use LEX and when to use YACC?
- How can we translate a YACC-like grammar to a
LEX-like rule?
- How do we write grammar rules in the first place, and how do we pick the
best way of writing them?
Grammar rules
thing1 thing2 thing3 . . .
>
name things |
name : things ; |
%% |
|
...{name}... |
... name ... |
LEX (may not recurse) |
YACC (may recurse) |
<>
..
>
name : thing1 thing2 ; |
name : thing1 ; |
|
name : thing2 ; |
<>
use either form (LEX omits name :
and ;
)
LEX can also handle single-character alternatives using [ ]
use an empty alternative (or ?
with LEX)
..
>
(thing)+ |
|
name : thing name thing ; |
|
or |
name : thing thing name ; |
LEX |
YACC |
|
<>
>
(thing)* |
|
name : name thing ; |
|
or |
name : thing name ; |
LEX |
YACC |
|
<>
..
Example of equivalent LEX and YACC grammar rules
number : digit | number digit ;
digit : '0' | '1' | . . . | '9' ;
YACC
|
|
Left and right recursion
We will use two different grammars, one right recursive: exp : id | id '+' exp ;
and one left recursive: exp : id | exp '+' id ;
to recognise a + b + c
as a expression.
(Assume that we can recognise
and convert a
and b
and c
to identifiers somehow - they
are only used here to make the form of the parse trees clear.)
With both grammars, the form that expression nodes of the parse tree can
take is limited - either a single identifier or else a +
with an
identifier on one (specified) side and an expression on the other
(i.e. another node). Therefore we have one node for each identifier. Of the
three expression nodes this implies, one will be just an identifier, and the
other two will also contain the two +
s and subexpressions.
Example parse tree and equivalent fully-bracketed expression for
left-recursive and right-recursive grammars:
(((a) + b) + c)
left-recursive
|
(a + (b + (c)))
right-recursive
|
|
Therefore right recursive grammars should be used for right associative
operators, and left recursive grammars for left associative operators.
e.g. part of a Pascal declaration:
decl : idlist ':' typeid ;
idlist : id | idlist ',' id ;
Each id
recognised in idlist
should be declared to be of type
typeid
. e.g. for a , b : real
we want to declare a
and
b
to each be real
:
If we could do this using YACC, we would write something like:
decl : idlist ':' typeid {$1 = $3;}
;
idlist : id {declare ($1, $$);}
| idlist ',' id {declare ($1, $$); $3 = $$;}
;
BUT WE CAN NOT DO THIS WITH YACC!
However, it is always possible to rewrite the grammar so that YACC
can perform the calculations, although it may not always be easy or obvious.
For example, the (left-recursive) grammar used above is really something
like:
decl : id ( ',' id )* ':' typeid ;
and can be rewritten as (right-recursive):
decl : id tail {declare ($1, $2);}
;
tail : ',' id tail {$$ = $3; declare ($2, $3);}
| ':' typeid {$$ = $2;}
;
to give the following parse tree:
Precedence and associativity
The example calculators given in 4 give different precedence
to the operators +
, -
, *
, /
, (
and
)
by using a separate grammar rule for each precedence level.
We can use left recursion for left associative operators and right recursion
for right associative operators (5.2). For
non-associative operators, we just do not use recursion at all
e.g. a == b == c
is legal in C but not in Pascal (which also uses
=
instead of ==
) so we could use this grammar rule for Pascal:
condition : exp '=' exp ;
The problem with these methods is that they tend to complicate the grammar
and constrain language design. YACC has an alternative mechanism that
decouples the declarations of precedence and associativity from the grammar,
using %left
, %right
and %nonassoc
. e.g.:
%nonassoc '=' |
%left '+' '-' | increasing
%left '*' '/' |
%right not | precedence
%right '^' v (i.e. exponentiation)
%%
exp : exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp
| number | '(' exp ')' | exp '^' exp | exp '=' exp | not exp
;
YACC can also give different precedences to overloaded operators;
e.g. if - is used both for negation and for subtraction and we want to
give them different precedences (and negation the same precedence as for not):
| '-' exp %prec not
To allow for any sort of associativity, we have had to use grammar rules in
the example that contain both left and right recursion - if it were not for
the %left etc., this grammar would be hopelessly ambiguous and so unusable.
(Disambiguation uses precedence first, then associativity, as expected.) The
converse of this is, that you must be careful how you use these facilities,
as they can conceal unexpected ambiguities in the grammar.
(7 will explore the problems of ambiguous grammars.)
As it stands, parsers generated by YACC halt at the first grammar
error. This is not very informative to users, as there may be more errors to
be found, so how can we improve error handling?
- single-symbol errors
- We could explicitly modify the grammar to detect likely errors:
idlist : id
| idlist , id
| idlist id {yyerror ("missing ,");}
;
- bigger errors
- The parser must skip input symbols and exit parse rules (i.e. pop stack)
until it reaches a recognizable restart point. With YACC, we can
explicitly modify the grammar to indicate where to restart parsing:
statement : variable ':=' expression
| . . .
| error
;
suggests where error recovery takes place. YACC also has actions
yyerrorok
, yyclearin
, YYACCEPT
and YYERROR
to assist in this process.
Most modern parser-generators attempt, more or less successfully, to do both
these levels of error recovery automatically, so all we have to write is the
correct grammar. (Another reason to use a parser-generator, rather than
write parsers by hand.)
A tool like YACC can cope with most combinations of precedence,
associativity, number of operands and fix in expressions. It can even cope
with most forms of overloading.
However, YACC would have great difficulty in coping with an overloaded
operator that had different precedence, associativity, number of operands or
fix depending on its operands.
Probably because most humans would also find this hard to cope with, I know
of no such language.
Language designers can essentially ignore the difficulty of implementation
of expressions and concentrate on usability, as the average user and the
average compiler writer would probably agree about features they liked.
( indicates harder problems)
- Modify the calculator in 4.1 to accept one or more
expressions.
- Design a grammar for boolean expressions, involving operators AND, OR
and NOT, brackets ( and ), and operands TRUE, FALSE and C-like identifiers.
An example of the sort of expression allowed is:
a AND (B OR Abcd_99 AND NOT (FRED OR TRUE))
You can use any grammar notation you like initially. Split your grammar into
the portions that should be dealt with by LEX and by YACC, and write the
corresponding LEX and YACC programs. What is the parse tree you expect for
the example expression above?
- Modify the calculator in 4.6 to use
\%left
etc. You will need to merge and modify the rules for
exp
, term
and factor
.
See what happens when you alter the precedence and/or associativity of the
operators in various ways.
- Check your favourite compilers to see how they cope with simple
grammatical errors like missing or extra
{
, }
, ,
and
;
symbols. Is the error reported correctly, and is it located
accurately? Are there any subsequent spurious error messages? How
complicated a grammatical error can it cope with sensibly?
- Write LEX and YACC code to recognise grammars
written in BNF and/or EBNF, like the examples in 5.1.
(i.e. write a grammar that describes grammars!)
Louden: chs. 4.2, 4.4, 4.5
Capon & Jinks: chs. 6, 8.1, 8.4, 8.5, 8.6
Aho, Sethi & Ullman: chs. 2.2, 4.1, 4.3, 4.9, 5.1
Johnson
Levine, Mason & Brown: chs. 3, 7, 9
Next: Inside LEX and YACC:
Up: CS2111: Design and Implementation
Previous: YACC
  Contents
Pete Jinks
1999-09-30