YACC repeatedly shifts input symbols until it has a complete RHS and then reduces the RHS to its LHS. At each input, it must decide whether it has a complete RHS to reduce or not, and if so which RHS to use. If it has a complete RHS, will shifting more symbols give it a different, overlapping RHS?
YACC uses the LR(1) algorithm, which restricts the grammars it accepts so that it can always decide by looking ahead at the next input symbol.
If, at some point in a grammar, YACC can not make a decision just
using 1 symbol lookahead, it reports shift/reduce (is the RHS complete?) or
reduce/reduce (which RHS is it?) conflicts.
When this happens, we can look in the y.output
file to find the
states where something is wrong, and work out what the problem is.
list : id | list optcomma id | list ',' {yyerror ("extra comma ignored");} ; optcomma: ',' | {yyerror ("missing comma inserted");} ;the
y.output
file from YACC includes the following:
4: reduce/reduce conflict (red'ns 3 and 4) on id state 4 list : list ,_ (3) optcomma : ,_ (4) .=reduce 3(
_
indicates where we are up to in the input
- BYACC uses .
)
i.e. having read a ','
to get to state 4, if we then read an
id
we do not know which of the two rules to recognise the comma
with:
We can sometimes get more information by finding the state(s) that precede(s)
state 4 by looking for shift 4
or goto 4
in y.output
(and repeating if necessary):
state 1 $accept : list_$end list : list_optcomma id list : list_, optcomma : _ (5) $end accept , shift 4 id reduce 5 optcomma goto 3
list
followed by an id
is if we can insert
an optcomma
between them, which we can onll do by recognising an empty
optcomma
:
i.e. the parse tree for an input of id , id
can be:
|
Like LEX, YACC assumes that it should use the earlier of
the two rules, which in this case means rule 3 rather than rule 4, so we get
the second derivation. If we swap the rules for list
and
optcomma
we get the first derivation, which would be best.
However, few grammars are as simple as this, and it is rarely possible to be certain that a reduce-reduce conflict is safe, so we normally try to remove them completely.
In this case, the problem may be that we are forcing the parser to decide
what to do with the ','
too early on. We can try to delay the
decision by getting rid of the optcomma
grammar rule and putting extra
alternatives into the rule for list:
list : id | list ',' id | list id {yyerror ("missing comma inserted");} | list ',' {yyerror ("extra comma ignored");} ;Unfortunately, looking at the
y.output
file from YACC for
this grammar, we have simply replaced the reduce-reduce conflict by a
shift-reduce conflict:
3: shift/reduce conflict (shift 5, reduce 4) on id state 3 list : list ,_id list : list ,_ (4) id shift 5 . reduce 4i.e. after inputting and reducing a list, and shifting a
','
if the
parser then encounters another id
, it does not know whether to
include this id
in the current list
(shift) or leave this
id
to become part of a later list
(reduce). i.e. the parse
tree for an input of id , id
can be:
|
Looking at the possible parses, this is still pretty much the same problem as before, so unfortunately the change we made to the grammar has not made any real improvement - our grammar is fundamentally ambiguous, rather than (as we hoped after the first error) just a little too complicated for YACC to use.
Like LEX, YACC assumes that it should try and recognise
the longer of the two rules i.e it should shift the id
, which in this
case means we get the first derivation, which is what we wanted. However, as
mentioned above, it is dangerous to leave a conflict in a grammar and we
normally try to remove them completely.
The only thing to do is to take a step back and describe afresh what we are trying to recognise. I will concentrate on the identifiers, as they are what we are really interested in. We want to recognise a list of identifiers, each pair separated by zero or more commas (although we expect exactly one), and the whole list perhaps preceded and trailed by some commas (although we expect none):
list : no_commas idlist no_commas ; idlist : id | idlist one_comma id ; one_comma: commas | {yyerror ("missing comma inserted");} ; no_commas: commas {yyerror ("extra comma ignored");} | ; commas : commas ',' {yyerror ("extra comma ignored");} | ',' ;
A simple example of this is a well-known problem with conditional statements; the dangling else. Most imperative languages permit conditional statements to take two slightly different forms:
if ( ... ) ... if ( ... ) ... else ...(many languages also use the keyword
then
) so the else d
in
if (a) if (b) c else dcould be associated either with
if (a)
or with if (b)
.
Most languages attempt to fix this problem by stating that the second interpretation is more natural, and so is correct, although in Algol 60 the example would be illegal: a then-clause may not consist of an if-statement. Whatever the language definition, it is an extra rule that anyone learning the language has to remember.
Similarly, the language implementor has to deal with this special case: if
we use a tool like YACC we get a shift-reduce error, and the
generated parser would perform the shift to give us the second alternative as
expected. (To avoid the error message, if the language includes the
then
keyword, we can make then
and else
right
associative or give then
a lower precedence than else
.)
It is possible to make a simple change to the grammar which completely
removes this ambiguity; to have a terminating keyword such as end_if
or
fi
(c.f. 9, 11).
%left
s and see what error messages you
get from YACC. Do you understand how they arise? (Use the
-v
flag with YACC to obtain the file y.output
.) s = 'a' 'b' | 'a' s 'b' ;
a
s followed by an equal number of
b
s. This grammar has been extended to:
s = 'a' 'b' | 'a' s 'b' | 'a' s {error} | s 'b' {error} ;
$CS2111/e*/c_grammar/*
Run it through YACC to discover the
conflicts, and try to work out why they are there and how you could improve
the grammar to remove them.