%{ C declarations used in actions
#define stack_size 100
static int sp, stack [stack_size];
static void push (int i)
{
if (++sp<stack_size) stack[sp]= i;
else {printf ("error: stack overflow\n"); exit(1);}
}
static int pop (void)
{
if (sp>=0) return stack[sp - -];
else {printf ("error: stack underflow\n"); exit(1);}
}
%}
Lex definitions - not used in this example
%%
descriptions of corresponding actions
expected inputs (C statements or blocks)
[0-9]+ {push(atoi(yytext));}
"+" {push(pop()+pop());}
"*" {push(pop()*pop());}
- {int rhs=pop(); push(pop()-rhs);}
"/" {int rhs=pop(); push(pop()/rhs);}
; {printf("%d\n", pop());}
[ \t\n] ;
[^-0-9+*/; \t\n]+ {ECHO; printf(" unexpected\n");}
%% C code
int main (void)
{
sp=-1; yylex(); return 0;
}
static int yywrap (void) {return 1;}
any string of characters (special characters should be quoted by "" or \)
means recognise that string in the input
while or "while" means the keyword 'while'
"+" or \+ means the operator '+'
as for C, \n = newline, \t = tab, \b = backspace, \\ = \, etc.
^ represents the start of line and $ represents the end of line
^aardvark$ means a line just containing 'aardvark'
[characters] means any one character in the set of characters
[AEIOUaeiou] means recognise any one vowel
[A-Za-z] means recognise any one letter
[^characters] means any one character not in the set of characters
[^\n] means any character except a newline
. means any character except a newline
+ means 1 or more times, * means 0 or 1 or more times, {n,m} means repeat n
to m times
[0-9]+ means an unsigned integer
[A-Z][a-z]* means a capitalised name
[A-Za-z_][A-Za-z0-9_]* means a C identifier
a{2,3} means aa or aaa
alternatives separated by |
WHILE | while means either WHILE or while
(any substring)
([A-Z][A-Z])+ means an even number of upper-case letters
ab?c means ac or abc, i.e. the b is optional
ab/c means ab, but only if followed by c
ab/\n means the same as ab$
Rule Priority: the first rule, matching the longest possible string.
e.g. put error/default rules at end
-, ; the character itself
"+", "*", "/" special characters should be quoted
(or we could have used \+, \*, \/)
[ \t\n] white space is ignored
[0-9]+ a number is one or more numeric characters
[^-0-9+*/; \t\n]+ any other characters
e.g.
ALPHANUMERIC [A-Za-z0-9]
ALPHABETIC [A-Za-z]
%%
{ALPHABETIC}{ALPHANUMERIC}*
recognises identifiers
yytext: contains the actual characters recognised from input
yyleng: the number of characters in yytext
ECHO = printf ("%s", yytext): the default action
yylex: routine created by Lex from (expected input, action) lists.
yywrap: called at EOF; (boolean) result = terminate or not
Lex deals with EOF (i.e. End-Of-File e.g. control-D) as if with the following
rule:
EOF if (yywrap( )) return 0;
yywrap is useful during file inclusion, to return to the original file at
the end of the included file.
flex : infix.l -> infix.c
gcc : infix.c -> infix
infix : equations -> answers
Lex has other facilities, but those described above are the most important.
Refer to the Lex manual in the departmental library or at URL
http://www.cs.man.ac.uk/~pjj/cs2111/lex/lex.html
if necessary.