Debugging Lex, Yacc, etc. (for CS2121)


Typical compile-time errors

Here are the commoner error messages that you may see, with some hints as to what to do about them. Often, correcting one problem gets rid of several error messages, so fix as many as you can using these hints and your knowledge of C. If you still have error messages that aren't in this list, email me.

From Make

make: Fatal error in reader: makefile, line 8: Unexpected end of line seen

This is because you have spaces in your makefile where you should have tabs, on the previous line to the line number reported e.g.

	CC=gcc
	
	test: two
		two
	^^^^^^^^.................. this is a tab
	
	two: two.c
	        $(CC) -o two two.c
	^^^^^^^^.................. but these are spaces - replace them by a tab
The first character on the lines containing commands must be a tab.

`file_name' is up to date

This may be because you are editing the wrong file, or have forgotten to save it after editing, or it may be because you are asking "make" to make one of your source files (e.g. fred.l or fred.y) instead of the file generated from them (e.g. fred.c or fred).

From Flex

"test.l", line 7: warning, rule cannot be matched

This means that one of the patterns "hides" another. This can be because:
* you have put a general rule, like one for identifiers:

	[A-Za-z][A-Za-z0-9]*
before the special cases, like those for keywords:
	for
	do
	od
whereas the general case should always follow the special cases (just like the patterns in an SML function definition).
* one of your rules is wrong, and matches more inputs than you intended e.g.:
	[a-zA-Z0-9]+		{/*an identifier*/}
	[0-9]+        		[/*a number*/}
instead of:
	[a-zA-Z][a-zA-Z0-9]*	{/*an identifier*/}
	[0-9]+      	        	[/*a number*/}

"test.l", line 23: EOF encountered inside an action

This means that one of your actions is not properly terminated, but the line indicated will just be the last line of the file, so it is hard to spot where the problem is. Look for:
* A "{" without a matching "}" e.g.

	[0-9]+		{ECHO;
* An unterminated string e.g.
	[0-9]+		{MY_ECHO("number);}
* An unterminated comment e.g.
	[0-9]+		{/* a number ECHO;}

From Byacc

byacc: w - the symbol fred is undefined

This means that you have used "fred" on the right-hand-side of a grammar rule somewhere, but not defined it in your byacc file. This could be because:
* "fred" should be defined as a "%token" because it is recognised by flex. i.e.

	%token fred
* "fred" should be defined by a grammar rule but you forgot it. e.g.
	fred : number | '(' exp ')' ;
* you typed "fred" by mistake when you meant something else.
* If you also get an error message like: byacc: 2 rules never reduced this probably means that you have misspelled "fred" on the left-hand-side of the grammar rule that you intended to define it.

byacc: 4 rules never reduced

This can mean:
* that you have misspelled some rule-names (see above).
* that you have omitted some rules.
* that your "%start" declaration is wrong or missing.
* that you have mistyped one or more rules, using ":" where you should have used "|". e.g.

	term	: factor
		: term '*' factor .......... the : on this line should be a |
		| term '/' factor
		;
* if you also get messages such as byacc: 1 reduce/reduce conflict. or byacc: 1 shift/reduce conflict. then your byacc can't cope with your grammar - see the lecture handout "Ambiguous and confusing grammars".

byacc: w - line 29 of "infix_y.y", the default action assigns an undefined value to $$

If you have defined a "%type" for a grammar rule e.g.:

	%type <a_number> term
but you haven't decided on the actions yet e.g.:
	term	: factor
		| term '*' factor
		| term '/' factor
		;
byacc will automatically insert default actions, as if you had written e.g.:
	term	: factor		{$$ = $1;}
		| term '*' factor	{$$ = $1;}
		| term '/' factor	{$$ = $1;}
		;
and these may cause silly errors that may be detected by byacc or by gcc (see below). You may need to:
* insert the correct actions
* correct your "%type", "%token" and "%union" declarations - remember to use the names of the fields from your "%union" declaration, not the names of the types e.g.
	%union {int a_number;}
	%type <a_number> exp term factor
	%token <a_number> number
not
	%union {int a_number;}
	%type <int> exp term factor
	%token <int> number
* or maybe even remove the "%type" and "%union" declarations (and any types in "%token" declarations) if you really don't want any actions.

From Gcc trying to compile Flex and Byacc programs

y.tab.h:2: parse error before `257'
or infix_l.l:7: parse error before `.257'
or infix_y.y:16: parse error before `.257'

This is caused by a mistake in your byacc file - you have given the same name to a token passed from flex to byacc and to a component of your "%union" e.g.

	%union {int a_number;}
	. . .
	%token a_number
	. . .
	factor: a_number | . . .
One or the other has to be renamed wherever it is used, in both the flex and byacc files. You can spot exactly which name is the problem by looking for the number given in the error message in the file "y.tab.h" e.g.
	#define a_number 257

infix_y.y:24: warning: assignment makes integer from pointer without a cast

This, and other, similar errors, may be caused by byacc inserting default actions in your program - see section 1.3 on byacc above.

infix_y.y:27: parse error before `int'

One common cause of this error is using the names of the types from your "%union" declaration in "%token" or "%type" declarations instead of the names of the fields e.g. instead of (correct):

	%union {int a_number;}
	%type <a_number> exp term factor
	%token <a_number> number
writing something like (incorrect):
	%union {int a_number;}
	%type <int> exp term factor
	%token <int> number

hundreds of weird error messages about a lex program e.g.

You must not put any spaces or tabs before the patterns in your lex file i.e. not like:

%%
  [A-Za-z]	{printf("%s\n", yytext);}
\n|.		{/*discard everything else*/}
%%

as lex assumes that anything following blanks is a piece of c code, and that patterns start at the beginning of the line. You will not get any error message from flex, but you will get lots and lots of error messages from gcc.

If you use start states, you will get similarly meaningless error messages if you leave a space between the start state and the pattern e.g. don't do:

<STATE1>  [A-Za-z]		{action1();}



Typical run-time problems

Assuming that flex, byacc and gcc have compiled your program without producing any error messages, the most common problem when you run your program is for it to print syntax error and then stop. How can you find out what went wrong?

The most important thing is to find out where in the input the error occurred, and the following sections explain how to get flex and byacc to tell you what they are doing and what they thought was going wrong when your program stopped.

Tracing information from Flex

Make use of Lex's ECHO facility; if you insert

	ECHO;
in every action all the input will be echoed to the output. This is especially useful if your program is crashing or stopping unexpectedly, as you can at least spot where you are in the input.
	%%
	[A-Za-z0-9]+	{ECHO; return name;}
	[0-9]+	    	{ECHO; return number;}

A typical problem is not having patterns to match every single possible input. Make sure the last rule in your Lex program is something like:

	\n|.       		{ECHO; printf(" unexpected\n");}
If you are using start states, then you will also have to do this for each of your exclusive start states e.g.
	%x START
	. . .
	<START>\n|.      	{ECHO; printf(" unexpected\n");}

If Lex is telling you about the characters that don't match the rest of its patterns, you stand some chance of spotting your error - it might even be an error in the input data.

If this still isn't enough, maybe your patterns are not matching the inputs you expected them to match. This problem is trickier; it is not sufficient just to ECHO all the input, although this can often give a clue. flex has a nice facility to help - add the following to your makefile:

	LFLAGS = -d
and add the following to your C code:
	extern int yy_flex_debug;
	 . . .
	yy_flex_debug= 1;
And you will get output like this:
	% infix
	--(end of buffer or a NUL)
	1+2=
	--accepting rule at line 7 ("1")
	--accepting rule at line 9 ("+")
	--accepting rule at line 7 ("2")
	--accepting rule at line 9 ("=")
	result is 3
	--accepting rule at line 8 ("
	")
	--(end of buffer or a NUL)
	--EOF (start condition 0)
where the lines containing the flex rules are:
	 line
	number
	
	 7	[0-9]+   	{yylval.a_number = atoi(yytext); return number;}
	 8	[ \t\n]      	;
	 9	[-+*/()=]	return yytext[0];
	10	.        	{ECHO; yyerror ("unexpected character");}
(If you set yy_flex_debug=0, the extra output is suppressed again.)

Alternatively, or if you are using lex rather than flex, redefine ECHO to output each token separately (and bracketed so you notice invisible characters), so you can at least see which characters make up each token:

	%{
	#undef ECHO
	#define ECHO printf ("[%s]\n", yytext)
	%}
	%%
	
	[A-Za-z0-9]+	{ECHO; return name;}
	[0-9]+	    	{ECHO; return number;}
	
	\n|.       		{ECHO; printf(" unexpected\n");}

However, even this isn't always sufficient - in this example, the input is being split up correctly, but being recognised by the wrong patterns (e.g. numbers are being recognised as names). In this case, either you have to get byacc to tell you what is going on (see the next section) or you have to do something like:

	%{
	#define MY_ECHO(symbol) printf ("[%s=%s]\n", symbol, yytext)
	%}
	%%
	
	[A-Za-z0-9]+	{MY_ECHO("name"); return name;}
	[0-9]+	    	{MY_ECHO("number"); return number;}
	
	\n|.       		{MY_ECHO("unknown"); printf(" unexpected\n");}
(I have used the name "MY_ECHO" instead of "ECHO" because I have given it a parameter - flex assumes that "ECHO" takes no parameters. When you have finished debugging, you can simply
	#define MY_ECHO(symbol) ECHO
or even as nothing at all, to get rid of the extra output.)

The error in this example is that the usual definition of a name is something like

	[A-Za-z][A-Za-z0-9]*
i.e. at least one character (the first) must be alphabetic, which is enough to distinguish names from numbers.

In fact, flex (but not lex) will give a warning if it notices that one pattern "hides" another - see section 1.2 above.

Tracing information from Byacc

We can also get some information from byacc, to tell us what it thinks has been input and what it is doing with the input. Modify your byacc file to include:

	%{
	. . .
	extern char *yytext;
	#define YYDEBUG_LEXER_TEXT yytext
	%}
	. . .
(If you are using lex instead of flex, the extern declaration for yytext should read extern char yytext[];) and initialise yydebug before you call yyparse:
	yydebug= 1;
	. . .
	yyparse();
(If you set yydebug=0, the extra output is suppressed again.)

You must also alter your call to byacc and the C compiler to cause the debugging code to be compiled. The simplest thing to do is to include the -t flag on your call to byacc. Here is an example makefile:

	CC = gcc
	YACC = byacc
	LEX = flex
	YFLAGS = -v -d -t
	
	infix: infix_y.o infix_l.o
		gcc infix_y.o infix_l.o -o infix
	
	infix_l.o: infix_y.y

A typical trace from a running program (the line "result is 3" comes from one of the actions in the byacc program - the input tells it to calculate "1+2"):

	yydebug: state 0, reading 257 (number)
	yydebug: state 0, shifting to state 1
	yydebug: state 1, reducing by rule 8 (factor : number)
	yydebug: after reduction, shifting from state 0 to state 6
	yydebug: state 6, reducing by rule 5 (term : factor)
	yydebug: after reduction, shifting from state 0 to state 5
	yydebug: state 5, reading 43 ('+')
	yydebug: state 5, reducing by rule 2 (exp : term)
	yydebug: after reduction, shifting from state 0 to state 4
	yydebug: state 4, shifting to state 9
	yydebug: state 9, reading 257 (number)
	yydebug: state 9, shifting to state 1
	yydebug: state 1, reducing by rule 8 (factor : number)
	yydebug: after reduction, shifting from state 9 to state 6
	yydebug: state 6, reducing by rule 5 (term : factor)
	yydebug: after reduction, shifting from state 9 to state 14
	yydebug: state 14, reading 61 ('=')
	yydebug: state 14, reducing by rule 3 (exp : exp '+' term)
	yydebug: after reduction, shifting from state 0 to state 4
	yydebug: state 4, shifting to state 8
	yydebug: state 8, reducing by rule 1 (line : exp '=')
	result is 3
	yydebug: after reduction, shifting from state 0 to state 3
	yydebug: state 3, reading 0 (end-of-file)

We can ignore the lines containing "shifting".
The lines containing "reading" tell you each symbol input by byacc (usually from flex):

	(number) ('+') (number) ('=') (end-of-file)
The lines containing "reducing" tell you what byacc did with them:
	(factor : number) (term : factor) (exp : term)
	(factor : number) (term : factor) (exp : exp '+' term) (line : exp '=')
Putting it all together:
	input number -> factor -> term -> exp
	input '+'
	input number -> factor -> term
	exp '+' term -> exp
	input '='
	exp '=' -> line

Alternatively, I have set up byacc on the teaching systems, which is Berkeley yacc plus Jim Roskind's modification. If you set yydebug to 5, you will get the following output, which gives you the same information as above but in a simpler form:

		   .... look ahead at number   `1'
	number <-- `1'
	factor
	term
	|		  .... look ahead at '+'   `+'
	exp
	|	'+' <-- `+'
	|	|		  .... look ahead at number   `2'
	|	|	number <-- `2'
	|	|	factor
	|	|	term
	|	|	|		  .... look ahead at '='   `='
	+-------+-------+
	|
	exp
	|	'=' <-- `='
	+-------+
	|
	line
	|		  .... look ahead at end-of-file   `'
Here is another example, this time using byacc without any lex - note the changes necessary to yylex and yytext to enable the tracing to work as before.