The grammar that beats an LR(1) parser from H.2 causes YACC to report:
5: reduce/reduce conflict (reduce 3, reduce 4) on c state 5 B : x y . (3) E : x y . (4)
As we expected, when the parser sees an input of x y c
it doesn't
have enough information to be able to decide between reducing the x y
to B
or to E
.
The grammar from H.1 causes YACC to report 42
shift/reduce conflicts! (Roughly speaking, this is because each of the 6
operators causes problems with each of the 7 grammar rules that includes
exp
.)
Here is one of the groups of conflicts:
6: shift/reduce conflict (shift 7, reduce 9) on '+' 6: shift/reduce conflict (shift 8, reduce 9) on '-' 6: shift/reduce conflict (shift 9, reduce 9) on '*' 6: shift/reduce conflict (shift 10, reduce 9) on '/' 6: shift/reduce conflict (shift 11, reduce 9) on '^' 6: shift/reduce conflict (shift 12, reduce 9) on '=' state 6 exp : exp . '+' exp (1) exp : exp . '-' exp (2) exp : exp . '*' exp (3) exp : exp . '/' exp (4) exp : exp . '^' exp (7) exp : exp . '=' exp (8) exp : '!' exp . (9)
What this is trying to tell us is that
when we have an input of the form exp op exp op exp
(or ! exp op exp
), when we see
the second op
erator, we don't know if the first op
should be reduced (because it has higher precedence
i.e. (exp op exp) op exp
)
or if the second op
should be shifted
(because it has the higher precedence
i.e. exp op (exp op exp)
).
Suppose we tried to modify the left recursive grammar from 5 to detect various errors:
$CS2121/e*/idlist/idlist1_y.y
list : id | list optcomma id | list ',' {yyerror ("extra comma ignored");} ; optcomma: ',' | {yyerror ("missing comma inserted");} ;
the y.output
file from YACC includes the following:
4: reduce/reduce conflict (red'ns 3 and 4) on id state 4 list : list ,_ (3) optcomma : ,_ (4) .=reduce 3
(_
indicates where we are up to in the input
- BYACC uses .
)
i.e. having read a ','
to get to state 4, if we then read an
id
we do not know which of the two rules to recognise the comma
with:
We can sometimes get more information by finding the state(s) that precede(s)
state 4 by looking for shift 4
or goto 4
in y.output
(and repeating if necessary):
state 1 $accept : list_$end list : list_optcomma id list : list_, optcomma : _ (5) $end accept , shift 4 id reduce 5 optcomma goto 3
Finally, if we reexamine the grammar, we can see that the only way the second
parse can recognise a list
followed by an id
is if we can insert
an optcomma
between them, which we can onll do by recognising an empty
optcomma
:
so to get to state 4 we have read a list
followed by a ','
, and
we can not tell if this is an optcomma
or not just by looking at the
next symbol (id
) because the id
can either belong to this
list
(i.e. reduce rule 4) or it can belong to a surrounding
list optcomma id
with an empty optcomma (i.e. reduce rule 3 then
rule 5) - in either case, we next shift the id
and reduce rule 2.
i.e. the parse tree for an input of id , id
can be:
|
The problem may be that we are forcing the parser to decide what to do with
the ','
too early on. We can try to delay the decision by getting rid
of the optcomma
grammar rule and putting extra alternatives into the
rule for list:
$CS2121/e*/idlist/idlist2_y.y
list : id | list ',' id | list id {yyerror ("missing comma inserted");} | list ',' {yyerror ("extra comma ignored");} ;
Unfortunately, looking at the y.output
file from YACC for
this grammar, we have simply replaced the reduce-reduce conflict by a
shift-reduce conflict:
3: shift/reduce conflict (shift 5, reduce 4) on id state 3 list : list ,_id list : list ,_ (4) id shift 5 . reduce 4i.e. after inputting and reducing a list, and shifting a
','
if the
parser then encounters another id
, it does not know whether to
include this id
in the current list
(shift) or leave this
id
to become part of a later list
(reduce). i.e. the parse
tree for an input of id , id
can be:
|
Looking at the possible parses, this is still pretty much the same problem as before, so unfortunately the change we made to the grammar has not made any real improvement.
Although we can't tell just from the error messages reporting the conflicts,
because we can create two alternative parse trees that both completely match a
possible legal input (i.e. id , id
), this grammar and the previous
version are both ambiguous, and can never be recognised correctly by a parser
tool.
The only thing to do is to take a step back and describe afresh what we are trying to recognise. I will concentrate on the identifiers, as they are what we are really interested in. We want to recognise a list of identifiers, each pair separated by zero or more commas (although we expect exactly one), and the whole list perhaps preceded and trailed by some commas (although we expect none):
$CS2121/e*/idlist/idlist3_y.y
list : no_commas idlist no_commas ; idlist : id | idlist one_comma id ; one_comma: commas | {yyerror ("missing comma inserted");} ; no_commas: commas {yyerror ("extra comma ignored");} | ; commas : commas ',' {yyerror ("extra comma ignored");} | ',' ;