%{ C declarations used in actions #define stack_size 100 static int sp, stack [stack_size]; static void push (int i) { if (++sp<stack_size) stack[sp]= i; else {printf ("error: stack overflow\n"); exit(1);} } static int pop (void) { if (sp>=0) return stack[sp - -]; else {printf ("error: stack underflow\n"); exit(1);} } %} Lex definitions - not used in this example %% descriptions of corresponding actions expected inputs (C statements or blocks) [0-9]+ {push(atoi(yytext));} "+" {push(pop()+pop());} "*" {push(pop()*pop());} - {int rhs=pop(); push(pop()-rhs);} "/" {int rhs=pop(); push(pop()/rhs);} ; {printf("%d\n", pop());} [ \t\n] ; [^-0-9+*/; \t\n]+ {ECHO; printf(" unexpected\n");} %% C code int main (void) { sp=-1; yylex(); return 0; } static int yywrap (void) {return 1;}
any string of characters (special characters should be quoted by "" or \)
means recognise that string in the input
while or "while" means the keyword 'while'
"+" or \+ means the operator '+'
as for C, \n = newline, \t = tab, \b = backspace, \\ = \, etc.
^ represents the start of line and $ represents the end of line
^aardvark$ means a line just containing 'aardvark'
[characters] means any one character in the set of characters
[AEIOUaeiou] means recognise any one vowel
[A-Za-z] means recognise any one letter
[^characters] means any one character not in the set of characters
[^\n] means any character except a newline
. means any character except a newline
+ means 1 or more times, * means 0 or 1 or more times, {n,m} means repeat n
to m times
[0-9]+ means an unsigned integer
[A-Z][a-z]* means a capitalised name
[A-Za-z_][A-Za-z0-9_]* means a C identifier
a{2,3} means aa or aaa
alternatives separated by |
WHILE | while means either WHILE or while
(any substring)
([A-Z][A-Z])+ means an even number of upper-case letters
ab?c means ac or abc, i.e. the b is optional
ab/c means ab, but only if followed by c
ab/\n means the same as ab$
Rule Priority: the first rule, matching the longest possible string.
e.g. put error/default rules at end
-, ; the character itself
"+", "*", "/" special characters should be quoted
(or we could have used \+, \*, \/)
[ \t\n] white space is ignored
[0-9]+ a number is one or more numeric characters
[^-0-9+*/; \t\n]+ any other characters
e.g.
ALPHANUMERIC [A-Za-z0-9] ALPHABETIC [A-Za-z] %% {ALPHABETIC}{ALPHANUMERIC}*recognises identifiers
yytext: contains the actual characters recognised from input
yyleng: the number of characters in yytext
ECHO = printf ("%s", yytext): the default action
yylex: routine created by Lex from (expected input, action) lists.
yywrap: called at EOF; (boolean) result = terminate or not
Lex deals with EOF (i.e. End-Of-File e.g. control-D) as if with the following
rule:
EOF if (yywrap( )) return 0;
yywrap is useful during file inclusion, to return to the original file at
the end of the included file.
flex : infix.l -> infix.c
gcc : infix.c -> infix
infix : equations -> answers
Lex has other facilities, but those described above are the most important.
Refer to the Lex manual in the departmental library or at URL
http://www.cs.man.ac.uk/~pjj/cs2111/lex/lex.html
if necessary.