next up previous
Next: YACC Up: CS2121: The Implementation and Previous: Representation and Meaning

Subsections


LEX

LEX


Postfix calculator in LEX

($CS2121/e*/postfix/*)
 %{ /* C declarations used in actions */
 #define stack_size 100
 static int sp, stack [stack_size];
 static void push (int i) {
   if (++sp<stack_size) stack[sp]= i;
   else {printf ("error: stack overflow\n"); exit(1);}
 }
 static int pop (void) {
   if (sp>=0) return stack[sp--];
   else {printf ("error: stack underflow\n"); exit(1);}
 }
 %}

 /*lex definitions - not used in this example*/

 %%

 /* descriptions of       corresponding actions
    expected inputs       (C statements or blocks) */

 [0-9]+                   {push (atoi(yytext));}
 "+"                      {push (pop() + pop());}
 "*"                      {push (pop() * pop());}
 -                        {int rhs= pop(); push (pop() - rhs);}
 "/"                      {int rhs= pop(); push (pop() / rhs);}
 ;                        {printf ("%d\n", pop());}
 [ \t\n]                  ;
 [^-0-9+*/; \t\n]+        {ECHO; printf (" unexpected\n");}

 %%                        /* C code */

 int main (void) {sp= -1; yylex(); return 0;}

 static int yywrap (void) {return 1;}

Descriptions of expected inputs

characters those characters (special characters must be in " ")
or while or "while" (the keyword) while
"characters" "+" (the operator) +
\character \n \t \\ newline tab $\backslash$ etc. as for C
  otherwise, $\backslash$character is an alternative to "character"
  \+ +
^ and $ the start and the end of line respectively
  ^aardvark$ a line just containing aardvark
[characters] any one character in the set of characters
  [AEIOUaeiou] any one vowel
  [A-Za-z] any one letter
[^characters] any one character not in the set of characters
  [^\n] any character except a newline
. any one character except a newline
+ 1 or more times
* 0 or 1 or more times
{n,m} n to m times
  [0-9]+ an unsigned integer
  [A-Z][a-z]* a capitalised name
  [A-Za-z][A-Za-z0-9_]* a C identifier
  a{2,3} aa or aaa
| separates alternatives
  WHILE|while either WHILE or while
(any sub-pattern) that sub-pattern
  ([A-Z][A-Z])+ an even number of upper-case letters
pattern? optional (but only the immediately preceding pattern!)
  ab?c ac or abc
/pattern only if followed by the pattern
  ab/c ab, but only if followed by c
  ab/\n the same as ab$

Rule Priority: the earliest rule matching the longest possible string.

grammar rules from $\S $2.1

-, ; the character itself
"+", "*", "/" special characters have to be quoted to just recognise them
[ \t\n] white space (ignored)
[0-9]+ one or more numeric characters (a number)
[^-0-9+*/; \t\n]+ any other characters

LEX definitions

Not normally needed, but e.g.

 ALPHANUMERIC   [A-Za-z0-9]
 ALPHABETIC     [A-Za-z]
 %%
 {ALPHABETIC}{ALPHANUMERIC}*

recognises identifiers

Actions, C declarations and code

yytext contains the actual characters recognised from input
yyleng the number of characters in yytext
ECHO = printf ("%s", yytext), the default action
yylex routine created by LEX from (expected input, action) lists.
yywrap called at EOF; (boolean) result = terminate or not
  LEX deals with EOF (e.g. control-D) as if with the following rule:
  EOF if (yywrap( )) return 0;
  e.g. use in file inclusion, to return to the original file at the end of the included file.

How LEX is used

FLEX : infix.l $\rightarrow$ infix.c
GCC : infix.c $\rightarrow$ infix
infix : equations $\rightarrow$ answers

LEX has other facilities, but those described above are the most important. Refer to the LEX manual in the departmental library or online if you need to.

Epilogue

We met regular expressions before, with EGREP, and here they are again in LEX. This isn't laziness or lack of imagination on the part of the designers of these programs - regular expressions are well suited to the job we are asking them to perform, finding simple patterns (such as words or numbers) in text.

Furthermore, although there are more facilites in LEX's version of regular expressions than in EGREP's, including some we haven't looked at such as start states, it turns out that these do not help us to recognise more complex patterns, but just help us to describe the same patterns in simpler ways or write the C actions more easily. In fact, you will see in the other half of this course that regular expressions are already as complicated as we can make them without having to fundamentally change (i.e. slow down) the way in which programs like EGREP and LEX work.

However, we do need to describe and search for more complex patterns, even though we have reached a fundamental limit with regular expressions. We can see these limits if we try to use LEX to deal with some other forms of expressions:

As regular expressions have reached their limit, we are going to have to use a new notation (BNF) to describe these sorts of patterns, and a new tool (YACC) to deal with the BNF.

Exercises

($\dag $ indicates harder problems)


Readings

Louden: ch. 4.1
Lesk & Schmidt
Levine, Mason & Brown: chs. 1, 2, 6
Capon & Jinks: ch. 7.2
Aho, Sethi & Ullman: ch. 3.5


next up previous
Next: YACC Up: CS2121: The Implementation and Previous: Representation and Meaning
Pete Jinks
2004-10-26