UP

Converting grammars to LL

Examples from 3rd year project outline

          non_expression = sub_exp | sub_exp '=' sub_exp

         left_expression = sub_exp | left_expression '+' sub_exp

        right_expression = sub_exp | sub_exp '^' right_expression
->
          non_expression = sub_exp ( '=' sub_exp )?

         left_expression = sub_exp ( '+' sub_exp )*

        right_expression = sub_exp ( '^' right_expression )?

Example from LR to LL Grammer Conversion

EXP : EXP + TERM
    | EXP - TERM
    | TERM
    ;
TERM : TERM * FACTOR
     | TERM / FACTOR
     | FACTOR
     ;
FACTOR : ( EXP )
       | identifier
       ;
-> [after left-factorisation and removing left-recursion]
EXP : TERM EXP1
    ;
EXP1: + TERM EXP1
    | - TERM EXP1
    |
    ;
TERM : FACTOR TERM1
     ;
TERM1: * FACTOR TERM1
     | / FACTOR TERM1
     |
     ;
FACTOR : ( EXP )
       | identifier
       ;

Example from How to confuse parsers

  stat   : target '=' exp ';'
         | target '(' explist ')' ';'
         ;
  target : id
         | target '.' id
         ;
-> [left factorisation]
  stat            : target assign_or_call ';'
                  ;
  assign_or_call  : '=' exp
                  | '(' explist ')'
                  ;
  target : id
         | target '.' id
         ;

Example from Converting BNF to JavaCC



        lvalue  : ident
                | '{' channels '}'
                | lvalue '.' ident
                | lvalue '[' expr ']'
                | lvalue '[' expr '..' expr ']'
                | lvalue '[' 'over' type ']'
                | lvalue '@' lvalue
                ;
('@' and '.' and '[' are all intended to be left associative, with '@' having the lowest precedence and '[' the highest precedence.)

becomes

        lvalue  : lvalue2 ( '@' lvalue2 )*
                ;
        lvalue2 : ( ident
                  | '{' channels '}'
                  )
                  ( '.' ident
                  | '[' ( expr  ( ']'
                                | '..' expr ']'
                                )
                        | 'over' type ']'
                        )
                  )*
                ;

Example from Converting LR to LL

Here is my reworking, although I suspect the original may be easier to automate. (I think there is a trivial misprint in the answer in the original)
  me : ue | be | ke ;
  ue : uod us ;
  uod: p | ue ;
  be : bod bs uod ;
  bod: uod | be ;
  ke : bod ( k bod )+ ;

-> [removing left recursion,
    using a: b | a c [or a: b | t; t: a c;] -> a: b c*
    and also using c* c -> c+ ]

  me : ue | be | ke ;
  ue : p us+ ;
  uod: p us* ;
  be : uod ( bs uod )+ ;
  bod: uod ( bs uod )* ;
  ke : bod ( k bod )+ ;

-> [substituting, using a* -> a+? and left-factoring ]

  me : p ( us+ | us+? bet | us+? bet? kp+ ) ;
  bet: ( bs p us+? )+ ;
  kp : k p us+? bet? ;

-> [more left-factoring ]

  me : p ( us+ | us+? t ) ;
  t  : bet | bet? kp+
  bet: ( bs p us+? )+ ;
  kp : k p us+? bet? ;

-> [continuing to left-factor, using a? b -> a b | b ]

  me : p ( us+ | us+ t | t ) ;
  t  : bet | bet kp+ | kp+
  bet: ( bs p us+? )+ ;
  kp : k p us+? bet? ;

-> [completing left-factoring, using ( | t ) -> t? ]

  me : p ( us+ t? | t ) ;
  t  : bet kp* | kp+
  bet: ( bs p us* )+ ;
  kp : k p us* bet? ;

An ambiguous grammar

E : id | "&" id | "(" E ")" | E "[" int "]" | E "=" E
My first attempt was:
-> left-factorise
E : id | "&" id | "(" E ")" | E ( "[" int "]" | "=" E )
-> remove left recursion
E : ( id | "&" id | "(" E ")" ) ( "[" int "]" | "=" E )*
-> BNF
E     : Ehead Etail
      ;
Ehead : id
      | "&" id
      | "(" E ")"
      ;
Etail :
      | "[" int "]" Etail
      | "=" E Etail
      ;
However, as a brighter person than me pointed out, this is not LL(1) as it is ambiguous! Expressions like "id=id=id" or "id=id[int]" could be parsed in more than one way, as there is nothing to say what the associativity of the "=" operator is, or the relative precedence of the "=" and "[ ]" operators (see Parsing Questions and Answers).

Starting again:

E : id | "&" id | "(" E ")" | E "[" int "]" | E "=" E
-> disambiguate - I have made some random but reasonable choices
E	: term
	| atom "=" E
	;
term	: atom
	| "(" E ")"
	;
atom	: id
	| "&" id
	| atom "[" int "]"
	;
(this happens to be LR(1), so it really is unambiguous!)
-> left-factor ("atom" can appear via term, or in an assignment, in E)
E	: atom ("=" E)?
	| "(" E ")"
	;
atom	: id
	| "&" id
	| atom "[" int "]"
	;
-> remove left-recursion
E	: atom ("=" E)?
	| "(" E ")"
	;
atom	: (id | "&" id) ("[" int "]")*
	;
-> BNF
E	: atom Etail
	| "(" E ")"
	;
Etail	:
	| "=" E
	;
atom	: id     atomtail
	| "&" id atomtail
	;
atomtail:
	| "[" int "]" atomtail
	;