CS2112: Exercises + Answers from $4 Yacc

* Why do we quote e.g. '=', '+' '*' etc. in the descriptions of expected inputs for Yacc, when we don't have to for Lex?

In Yacc, grammar rules can refer to each other (and to words recognised by Lex), so we need to be able to distinguish between identifiers (grammar rules) and literals (i.e. actual characters like '='). Some grammar systems use:

 <identifier>      literal
but Yacc uses:
   identifier       'literal'
because identifiers are typically more common.
On the other hand, the grammar rules in Lex do not normally contain anything but literals (and metacharacters). Occasionally, we can use identifiers in Lex rules, and then we use:
 {identifier} literal

* Although we can use single characters like '+' and '*' in Yacc, we can't use multi-character strings like 'mod'. Extend the calculator in $4.6 to include multi-character operators like "mod" and "div".

We would have to extend the Lex to include:

 mod		{return mod_op;}
 div		{return div_op;}

and change the Yacc to include:

 %token mod_op, div_op;
 . . .
 term	: factor       		{$$= $1;}
 	| . . .
 	| term mod_op factor	{$$= $1 % $3;}
 	| term div_op factor	{$$= $1 / $3;}
 . . .

* Extend the calculator in $4.6 to include some mathematical functions e.g. "sqrt ( expression )".

The Lex will have to recognise the function names, as for 'mod' and 'div' above. However, we have a choice as to how we communicate between Lex and Yacc i.e. the %token. We could give each function a separate token, or we could have tokens like 'one_op_func', 'two_op_func' etc., and have an associated value to indicate which one operand function we mean. We may need to add ',' to the list of special characters passed to Yacc by Lex.

Giving each function a separate token:

Lex:

  sqrt		{return sqrt_func;}

Yacc:

  %token sqrt_func
  . . .
  factor : . . .
 	| sqrt_func '(' exp ')'	{$$= sqrt($3);}
  . . .

Giving similar functions the same token:

Lex:

  #define sqrt_func 1
  . . .
  sqrt	{yylval.a_number= sqrt_func; return one_op_func;}

Yacc:

  #define sqrt_func 1
  . . .
  %token <a_number> one_op_func
  . . .
  factor : . . .
 	| one_op_func '(' exp ')'		{
 				switch ($1)
 				case sqrt_func: $$= sqrt($3); break;
 				. . .
 				}
 	| two_of_func '(' exp ',' exp ')'	{ . . . }
  . . .

* Design a grammar to recognise a string of the form "aa...abb...b", i.e. any number of "a"s followed by any number of "b"s. Would it be better to use Lex or Yacc to recognise it?

Lex:

  a+b+

Yacc e.g.:

  a_b	: as bs ;
  as	: 'a' | as 'a' ;
  bs	: 'b' | bs 'b' ;

I think the Lex version is much simpler, so I would use Lex.

Change your grammar to recognise strings with equal numbers of "a"s and "b"s - now which is best?

The best Lex can do is as above, and then count the 'a's and 'b's in the action to make sure they match.

Yacc e.g.:

  a_b	: 'a' a_b 'b' | 'a' 'b' ;

I would use the Yacc version.

+ What if we wanted equal numbers of "a"s, "b"s and "c"s?

Yacc can't do this. Lex could use a+b+c+ and then count the actual characters it had, but this is not very impressive. If you did CS2121, we need to use a context-sensitive grammar, which is more than either Lex or Yacc is capable of dealing with, and more than most programming languages need to describe them. (see $21).