yacc : rule | yacc rule ; rule : rulebody | rulebody ";" ; rulebody: RULENAME ':' alt | rulebody "|" alt ; alt : | alt TOKEN | alt RULENAME ;-> [EBNF]
yacc : rule+ ; rule : rulebody ";"? ; rulebody: RULENAME ':' (alt # "|")+ ; alt : (TOKEN | RULENAME)* ;-> [substitute rulebody and alt into rule]
yacc : rule+ ; rule : RULENAME ':' ((TOKEN | RULENAME)* # "|")+ ";"? ;-> [4.1: A=RULENAME ':'; B=; C=(TOKEN|RULENAME); D="|"; E=";"?]
yacc : rule+ ; rule : RULENAME ':' (TOKEN | RULENAME | "|")* ";"? ;-> [substitute rule into yacc]
yacc : (RULENAME ':' (TOKEN | RULENAME | "|")* ";"? )+ ;-> [4.2: A=; B=RULENAME ':'; C=TOKEN|RULENAME|"|"; D=";"?; E=;]
yacc : RULENAME ':' (TOKEN | RULENAME | "|" | ";"? RULENAME ':')* ";"? ;Now, we need to convert back into BNF, and there are several possibilities:
yacc : head ";"? ; head : RULENAME ':' (TOKEN | RULENAME | "|" | ";"? RULENAME ':')* ;->
yacc : head ";"? ; head : RULENAME ':' | head (TOKEN | RULENAME | "|" | ";"? RULENAME ':') ;->
yacc : head ";" | head ; head : RULENAME ':' | head TOKEN | head RULENAME | head "|" | head RULENAME ':' | head ";" RULENAME ':' ;
yacc : RULENAME ':' middle ";"? ; middle : (TOKEN | RULENAME | "|" | ";"? RULENAME ':')* ;->
yacc : RULENAME ':' middle | RULENAME ':' middle ";" ; middle : | middle TOKEN | middle RULENAME | middle "|" | middle RULENAME ':' | middle ";" RULENAME ':' ;which is also LR(1)
yacc : RULENAME ':' tail ; tail : (TOKEN | RULENAME | "|" | ";"? RULENAME ':')* ";"? ;->
yacc : RULENAME ':' tail ; tail : TOKEN tail | RULENAME tail | "|" tail | RULENAME ':' tail | ";" RULENAME ':' tail | ";" | ;which is also LR(1).
yacc : rule+ ; rule : RULENAME ':' ((TOKEN | RULENAME)* # "|")+ ";"? ;->
yacc : (rule2 ";"?)+ ; rule2 : RULENAME ':' ((TOKEN | RULENAME)* # "|")+ ;-> [4.1: A=RULENAME ':'; B=; C=(TOKEN|RULENAME); D="|"; E=;]
yacc : (rule2 ";"?)+ ; rule2 : RULENAME ':' (TOKEN | RULENAME | "|")* ;-> back to BNF
yacc : (rule2 ";"?); rule2 : RULENAME ':' | rule2 TOKEN | rule2 RULENAME | rule2 "|" ;and if we use left-recursion for yacc this is still LR(2):
yacc : rule2 ";" | rule2 | yacc rule2 ";" | yacc rule2 ; rule2 : RULENAME ':' | rule2 TOKEN | rule2 RULENAME | rule2 "|" ;but if we use right recursion it is LR(1):
yacc : rule2 ";" | rule2 | rule2 ";" yacc | rule2 yacc ; rule2 : RULENAME ':' | rule2 TOKEN | rule2 RULENAME | rule2 "|" ;
header : type_name id '(' params ')' | type_name id '(' ')' ; type_name: id | more_complex_type_descriptions ; params : param | params ',' param ; param : type_name ids ; ids : id | ids ',' id ;Looking at the conflict reported by yacc:
12: shift/reduce conflict (shift 15, reduce 7) on ',' state 12 param : type_name ids . (7) ids : ids . ',' id (9)the problem relates to what happens when we see a ',' near or within a "param", so we will concentrate just on the rules for "params", "param" and "ids".
params : (param # ',')+ ; param : type_name ids ; ids : (id # ',')+ ;-> [merge rules]
params : (type_name (id # ',')+ # ',')+ ;-> [4.5: A=; B=type_name; C=id; D=','; F=','; G=;]
params : type_name id (',' typename id | ',' id)* ;-> [BNF]
header : type_name id '(' params ')' | type_name id '(' ')' ; type_name: id | more_complex_type_descriptions ; params : type_name id | params ',' typename id | params ',' id ;
exp : exp '-' sub_exp | sub_exp ; sub_exp : '(' type_name ')' sub_exp | id | literal | '(' exp ')' ; type_name: id | complex_type ;This is the conflict report from yacc:
6: reduce/reduce conflict (reduce 4, reduce 7) on ')' state 6 sub_exp : id . (4) type_name : id . (7)which doesn't help much. However, maybe it encorages us to expand "type_name" within "sub_exp", to see more context:
exp : exp '-' sub_exp | sub_exp ; sub_exp : '(' id ')' sub_exp | '(' complex_type ')' sub_exp | id | literal | '(' exp ')' ;giving:
6: shift/reduce conflict (shift 10, reduce 5) on ')' state 6 sub_exp : '(' id . ')' sub_exp (3) sub_exp : id . (5)which is slightly more informative.
( a ) + b
( a ) b
sub_exp : '(' exp ')'
exp : id | . . .
sub_exp : '(' exp ')'
I can't see how using EBNF will help here e.g.
exp : (sub_exp # '-')<+ ; sub_exp : ('(' (id | complex_type) ')')*> ( id | literal | '(' exp ')' ) ;Instead, we will stick with the BNF, and gradually expose the ids that represent values within an exp:
exp : exp '-' sub_exp | sub_exp ; sub_exp : '(' id ')' sub_exp | '(' complex_type ')' sub_exp | id | literal | '(' exp ')' ;-> [exposing id as a special case of sub_exp]
exp : exp '-' sub_exp | sub_exp ; sub_exp : id | sub_exp2 ; sub_exp2: '(' id ')' sub_exp | '(' complex_type ')' sub_exp | literal | '(' exp ')' ;-> [substituting into exp]
exp : exp '-' sub_exp | id | sub_exp2 ; sub_exp : id | sub_exp2 ; sub_exp2: '(' id ')' sub_exp | '(' complex_type ')' sub_exp | literal | '(' exp ')' ;-> [exposing id as a special case of exp]
exp : id | exp2 ; exp2 : exp '-' sub_exp | sub_exp2 ; sub_exp : id | sub_exp2 ; sub_exp2: '(' id ')' sub_exp | '(' complex_type ')' sub_exp | literal | '(' exp ')' ;-> [substituting back into sub_exp2]
exp : id | exp2 ; exp2 : exp '-' sub_exp | sub_exp2 ; sub_exp : id | sub_exp2 ; sub_exp2: '(' id ')' sub_exp | '(' complex_type ')' sub_exp | literal | '(' id ')' | '(' exp2 ')' ;
obj_id : id | obj_id "." id ; expr : obj_id | simple_func | obj_func ; simple_func: id "(" asgn_list ")" ; asgn_list: (obj_id "=" expr ";")+ ; obj_func: obj_id "(" expr_list? ")" ; expr_list: expr ("," expr)* ;Converting fully to EBNF may help in understanding the grammar:
expr : obj_id | obj_id "(" ( expr # ",")* ")" | id "(" (obj_id "=" expr ";")+ ")" ; obj_id : (id # ".")+ ;but doesn't really help find an equivalent LR(1) grammar.
After converting to BNF:
expr : obj_id | simple_func | obj_func ; obj_id : id | obj_id "." id ; simple_func: id "(" asgn_list ")" ; asgn_list: asgn | asgn_list asgn ; asgn : obj_id "=" expr ";" ; obj_func: obj_id "(" opt_expr_list ")" ; opt_expr_list: | expr_list ; expr_list: expr | expr_list "," expr ;yacc reports the following:
1: shift/reduce conflict (shift 6, reduce 4) on '(' state 1 obj_id : id . (4) simple_func : id . '(' asgn_list ')' (6)so we need to expand "opt_id" to expose more right-context:
expr : id | obj_id "." id | simple_func | obj_func ; obj_id : id | obj_id "." id ; simple_func: id "(" asgn_list ")" ; asgn_list: asgn | asgn_list asgn ; asgn : id "=" expr ";" | obj_id "." id "=" expr ";" ; obj_func: id "(" opt_expr_list ")" | obj_id "." id "(" opt_expr_list ")" ; opt_expr_list: | expr_list ; expr_list: expr | expr_list "," expr ;which is conflict-free. Rewriting slightly to simplify further use:
expr : id | obj_id "." id | simple_func | obj_func ; obj_id : id | obj_id "." id ; simple_func: id "(" asgn_list ")" ; asgn_list: asgn | asgn_list asgn ; asgn : id asgn_tail | obj_id "." id asgn_tail ; asgn_tail: "=" expr ";" ; obj_func: id obj_func_tail | obj_id "." id obj_func_tail ; obj_func_tail: "(" ")" | "(" expr_list ")" ; expr_list: expr | expr_list "," expr ;
%token PARA PARA_END BOLD BOLD_END TEXTBLOCK %% text : | text textElement ; textElement: inlineElement | paragraph ; inlineText: | inlineText inlineElement ; inlineElement: TEXTBLOCK | BOLD inlineText BOLD_END ; paragraph: PARA inlineText optParaEnd ; optParaEnd: | PARA_END ;-> EBNF
text : ( inlineElement | PARA inlineElement* PARA_END? )* ; inlineElement: TEXTBLOCK | BOLD inlineElement* BOLD_END ;-> [ (a | b)* -> (a* # b)* ]
text : ( inlineElement* # PARA inlineElement* PARA_END? )* ; inlineElement: TEXTBLOCK | BOLD inlineElement* BOLD_END ;-> [ (a # b)* -> (a # b)+ if a can become epsilon e.g. a=c? or a=c* ]
text : ( inlineElement* # PARA inlineElement* PARA_END? )+ ; inlineElement: TEXTBLOCK | BOLD inlineElement* BOLD_END ;-> [expand PARA_END?]
text : ( inlineElement* # ( PARA inlineElement* | PARA inlineElement* PARA_END ) )+ ; inlineElement: TEXTBLOCK | BOLD inlineElement* BOLD_END ;-> [ (a* # (b a* | c) )+ -> (a* # (b|c) )+ ]
text : ( inlineElement* # ( PARA | PARA inlineElement* PARA_END ) )+ ; inlineElement: TEXTBLOCK | BOLD inlineElement* BOLD_END ;-> BNF
text : inlineText | text PARA inlineText | text PARA inlineText PARA_END inlineText ; inlineText: | inlineText inlineElement ; inlineElement: TEXTBLOCK | BOLD inlineText BOLD_END ;I actually worked out the sequence of manipulations above from the answer, which I got to by remembering what I did for optional commas.
Alternatively, from
text : ( inlineElement* # PARA inlineElement* PARA_END? )+ ; inlineElement: TEXTBLOCK | BOLD inlineElement* BOLD_END ;->BNF
text : inlineText | text PARA inlineText PARA_END inlineText | text PARA inlineText inlineText ; inlineText: | inlineText inlineElement ; inlineElement: TEXTBLOCK | BOLD inlineText BOLD_END ;-> [explicitly remove the ambiguity]
text : inlineText | text PARA inlineText PARA_END inlineText | text PARA inlineText ; inlineText: | inlineText inlineElement ; inlineElement: TEXTBLOCK | BOLD inlineText BOLD_END ;
list : id | list optcomma id | list ',' /* extra comma */ ; optcomma: ',' | /* missing comma */ ;-> EBNF
list : id ( (',' | /* missing comma */) id | ',' /* extra comma */ )* ;-> [expand]
list : id (',' id | /* missing comma */ id | ',' /* extra comma */ )* ;-> [ a ( b a | a | b )* -> a ( b | a )* or (a # b*)+ b* ]
list : ( id # comma )* nocomma ; comma : /* missing comma */ | ',' nocomma ; nocomma : | (',' /* extra comma */)+ ;-> BNF
list : sublist nocomma ; sublist : id | sublist comma id ; comma : /* missing comma */ | ',' nocomma ; nocomma : | nocomma ',' /* extra comma */ ;but this still has a conflict (between the nocomma after the sublist in list and the comma after a sublist in sublist) so try again:
list : sublist nocomma ; sublist : id | sublist comma id ; comma : /* missing comma */ | commas ; nocomma : | commas /* extra comma */ ; commas : ',' | commas ',' /* extra comma */ ;
input : /* empty */ | input decl | input exprlist ; decl : TYPE declarator ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ;The problem is that an input of
TYPE ( ID )
matches both decl
and expr
,
and there is no obvious reason to prefer one derivation over the other.
The only thing I can see to do is to explicity expose the ambiguity, by expanding rules until it is visible e.g.:
input : /* input decl has been expanded to: */ | input TYPE declarator /* input exprlist has been expanded to: */ | input expr | input exprlist OP expr ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ;However, this can't succeed as
exprlist
is left-recursive,
so each expansion of it will leave another exprlist
on the
left. As with
converting grammars for LL parsing,
we need to get rid of the left recursion (e.g. replacing it by right
recursion, and left-factoring to reduce the amount of work later):
input : /* input decl has been expanded to: */ | input TYPE declarator /* input exprlist has been expanded to: */ | input expr tail ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ; tail : | OP exprlist ;now we can expand some more:
input : /* input decl has been expanded to: */ | input TYPE ID | input TYPE '(' declarator ')' /* input exprlist has been expanded to: */ | input ID tail | input TYPE '(' exprlist ')' tail ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ; tail : | OP exprlist ;and some more:
input : /* input decl has been expanded to: */ | input TYPE ID | input TYPE '(' ID ')' | input TYPE '(' '(' declarator ')' ')' /* input exprlist has been expanded to: */ | input ID tail | input TYPE '(' expr tail ')' tail ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ; tail : | OP exprlist ;and some more:
input : /* input decl has been expanded to: */ | input TYPE ID | input TYPE '(' ID ')' | input TYPE '(' '(' declarator ')' ')' /* input exprlist has been expanded to: */ | input ID tail | input TYPE '(' ID tail ')' tail | input TYPE '(' TYPE '(' exprlist ')' tail ')' tail ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ; tail : | OP exprlist ;and some more:
input : /* input decl has been expanded to: */ | input TYPE ID | input TYPE '(' ID ')' | input TYPE '(' '(' declarator ')' ')' /* input exprlist has been expanded to: */ | input ID tail | input TYPE '(' ID ')' tail | input TYPE '(' ID OP exprlist ')' tail | input TYPE '(' TYPE '(' exprlist ')' tail ')' tail ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ; tail : | OP exprlist ;and some more, and now the ambiguity is visible:
input : /* input decl has been expanded to: */ | input TYPE ID | input TYPE '(' ID ')' /* OOPS! */ | input TYPE '(' '(' declarator ')' ')' /* input exprlist has been expanded to: */ | input ID tail | input TYPE '(' ID ')' /* OOPS! */ | input TYPE '(' ID ')' OP exprlist | input TYPE '(' ID OP exprlist ')' tail | input TYPE '(' TYPE '(' exprlist ')' tail ')' tail ; declarator: ID | '(' declarator ')' ; expr : ID | TYPE '(' exprlist ')' ; exprlist: expr | exprlist OP expr ; tail : | OP exprlist ;
The ambiguity then has to be resolved somehow (e.g. by forcing it to be
recognised as a decl
rather than an expr
).