.NH S 2 Lex

Lex program for a postfix calculator


($CS5031/e*/postfix/*)

 %{			C declarations used in actions
 #define stack_size 100
 static int sp, stack [stack_size];

 static void push (int i)
 {
   if (++sp<stack_size) stack[sp]= i;
   else {printf ("error: stack overflow\n"); exit(1);}
 }

 static int pop (void)
 {
   if (sp>=0) return stack[sp - -];
   else {printf ("error: stack underflow\n"); exit(1);}
 }
 %}
 		Lex definitions - not used in this example
 %%

 descriptions of 	corresponding actions
 expected inputs	(C statements or blocks)

 [0-9]+  		{push(atoi(yytext));}
 "+"    		{push(pop()+pop());}
 "*"    		{push(pop()*pop());}
 -       		{int rhs=pop(); push(pop()-rhs);}
 "/"    		{int rhs=pop(); push(pop()/rhs);}
 ;      		{printf("%d\n", pop());}
 [ \t\n]		;
 [^-0-9+*/; \t\n]+	{ECHO; printf(" unexpected\n");}

 %%			C code

 int main (void)
 {
   sp=-1; yylex(); return 0;
 }

 static int yywrap (void) {return 1;}

descriptions of expected inputs

any string of characters (special characters should be quoted by "" or \) means recognise that string in the input
while or "while" means the keyword 'while'
"+" or \+ means the operator '+'
as for C, \n = newline, \t = tab, \b = backspace, \\ = \, etc.

^ represents the start of line and $ represents the end of line
^aardvark$ means a line just containing 'aardvark'

[characters] means any one character in the set of characters
[AEIOUaeiou] means recognise any one vowel
[A-Za-z] means recognise any one letter

[^characters] means any one character not in the set of characters
[^\n] means any character except a newline

. means any character except a newline

+ means 1 or more times, * means 0 or 1 or more times, {n,m} means repeat n to m times
[0-9]+ means an unsigned integer
[A-Z][a-z]* means a capitalised name
[A-Za-z_][A-Za-z0-9_]* means a C identifier
a{2,3} means aa or aaa

alternatives separated by |
WHILE | while means either WHILE or while

(any substring)
([A-Z][A-Z])+ means an even number of upper-case letters

ab?c means ac or abc, i.e. the b is optional

ab/c means ab, but only if followed by c
ab/\n means the same as ab$

Rule Priority: the first rule, matching the longest possible string.
e.g. put error/default rules at end

example grammar rules from program:

-, ; the character itself
"+", "*", "/" special characters should be quoted
(or we could have used \+, \*, \/)
[ \t\n] white space is ignored
[0-9]+ a number is one or more numeric characters
[^-0-9+*/; \t\n]+ any other characters

Lex definitions

e.g.

 ALPHANUMERIC [A-Za-z0-9]
 ALPHABETIC [A-Za-z]
 %%
 {ALPHABETIC}{ALPHANUMERIC}*
recognises identifiers

Actions, C declarations & code

yytext: contains the actual characters recognised from input

yyleng: the number of characters in yytext

ECHO = printf ("%s", yytext): the default action

yylex: routine created by Lex from (expected input, action) lists.

yywrap: called at EOF; (boolean) result = terminate or not
Lex deals with EOF (i.e. End-Of-File e.g. control-D) as if with the following rule:
EOF if (yywrap( )) return 0;
yywrap is useful during file inclusion, to return to the original file at the end of the included file.

How Lex is used


flex : infix.l -> infix.c

gcc : infix.c -> infix

infix : equations -> answers

Lex has other facilities, but those described above are the most important. Refer to the Lex manual in the departmental library or at URL
http://www.cs.man.ac.uk/~pjj/cs2111/lex/lex.html
if necessary.