Examples of converting BNF to JavaCC form

Convert the following yacc-like grammar to be suitable for JavaCC:

	lvalue	: ident
		| '{' channels '}'
		| lvalue '.' ident
		| lvalue '[' expr ']'
		| lvalue '[' expr '..' expr ']'
		| lvalue '[' 'over' type ']'
		| lvalue '@' lvalue
		;

('@' and '.' and '[' are all intended to be left associative, with '@' having the lowest precedence and '[' the highest precedence.)

First, we need to clarify how we deal with input like e.g. 'a.a1@b[b1]@c.c1' by incorporating the precedence and associativity rules into the grammar. (With yacc, we have the option of avoiding changing the grammar by using %left, %right and %nonassoc to get the desired result, but there is no similar mechanism in JavaCC.)

Concepts like precedence and associativity are certainly needed to deal with ordinary expressions, like 'a+b*c+d', as we need them to tell us which of the following is intended:

	a+(b*(c+d))
	(a+b)*(c+d)
	((a+b)*c)+d
	a+(b*c)+d

(Usually, we would expect the correct answer to be the last of the 4 possibilities above.)

However, when we look at the grammar for lvalue above, the rules for '.' and '[ ]' only have left recursion (also known as head recursion):

	lvalue	: . . .
		| lvalue '.' . . .
		| lvalue '[' . . . ']'

and so lvalues that only include '.' and '[ ]' must be read strictly left-to-right.
(I am assuming that an ident cannot contain an lvalue - if it could, we would have to investigate this problem more carefully. We don't need to worry about an expr or type containing an lvalue, as this can only happen inside '[ ]' brackets, and the brackets will override any precedence or associativity rules, just as '( )' brackets do in an ordinary expression.)

Thus, something like 'a.b[c].d' can already only be recognised one way, as:

	((a.b)[c]).d

Therefore, '[ ]' and '.' are already left-associative, and separating them into two different precedence levels is completely unnecessary, and would actually make things more complicated than they really are!

However, we do need to give '@' a lower precedence than '.' and '[ ]', and make it left-associative as in the grammar above it is fully recursive:

	lvalue	: . . .
		| lvalue '@' lvalue
		;

and therefore just as capable of giving rise to ambiguity as '*' and '+' in ordinary expressions. i.e. we need two precedence levels, the lowest for '@', and the highest for both '.' and '[ ]', so we can rewrite the grammar as:

	lvalue	: lvalue2
		| lvalue '@' lvalue2
		;
	lvalue2	: ident
		| '{' channels '}'
		| lvalue2 '.' ident
		| lvalue2 '[' expr ']'
		| lvalue2 '[' expr '..' expr ']'
		| lvalue2 '[' 'over' type ']'
		;

(As '@' is intended to be left-associative, I have rewritten its grammar rule to just use left recursion.)

This would be sufficient for yacc, as it uses LR(1), but we need to process the grammar further to make it completely suitable for JavaCC, which uses a variant of LL(k).

Replacing left-recursion by iteration:

	lvalue	: lvalue2 ( '@' lvalue2 )*
		;
	lvalue2	: ( ident
		  | '{' channels '}'
		  )
		  ( '.' ident
		  | '[' expr ']'
		  | '[' expr '..' expr ']'
		  | '[' 'over' type ']'
		  )*
		;

Left-factoring:

	lvalue	: lvalue2 ( '@' lvalue2 )*
		;
	lvalue2	: ( ident
		  | '{' channels '}'
		  )
		  ( '.' ident
		  | '['	( expr  ( ']'
				| '..' expr ']'
				)
		  	| 'over' type ']'
			)
		  )*
		;

There is no need to right-factor out the ']', nor to replace

	']' | '..' expr ']'

	( '..' expr )? ']'

although we might want to do so to make the grammar clearer or to simplify a later step, such as creating a parse-tree.