$CS2111/e*/postfix/*
)
%{ /* C declarations used in actions */ #define stack_size 100 static int sp, stack [stack_size]; static void push (int i) { if (++sp<stack_size) stack[sp]= i; else {printf ("error: stack overflow\n"); exit(1);} } static int pop (void) { if (sp>=0) return stack[sp - -]; else {printf ("error: stack underflow\n"); exit(1);} } %} /*lex definitions - not used in this example*/ %% /* descriptions of corresponding actions expected inputs (C statements or blocks) */ [0-9]+ {push (atoi(yytext));} "+" {push (pop() + pop());} "*" {push (pop() * pop());} - {int rhs= pop(); push (pop() - rhs);} "/" {int rhs= pop(); push (pop() / rhs);} ; {printf ("%d\n", pop());} [ \t\n] ; [^-0-9+*/; \t\n]+ {ECHO; printf (" unexpected\n");} %% /* C code */ int main (void) {sp= -1; yylex(); return 0;} static int yywrap (void) {return 1;}
characters | those characters (special characters must be in " ") | |
or | while or "while" |
(the keyword) while |
" characters" |
"+" |
(the operator) + |
\ character |
\n \t \\ |
newline tab etc. as for C |
otherwise, character is an alternative to "character" | ||
\+ |
+ | |
^ and $ |
the start and the end of line respectively | |
^aardvark$ |
a line just containing aardvark |
|
[ characters] |
any one character in the set of characters | |
[AEIOUaeiou] |
any one vowel | |
[A-Za-z] |
any one letter | |
[^ characters] |
any one character not in the set of characters | |
[^\n] |
any character except a newline | |
. |
any one character except a newline |
>
+ |
1 or more times | |
* |
0 or 1 or more times | |
{ n, m} |
n to m times | |
[0-9]+ |
an unsigned integer | |
[A-Z][a-z]* |
a capitalised name | |
[A-Za-z][A-Za-z0-9_]* |
a C identifier | |
a{2,3} |
aa or aaa | |
| |
separates alternatives | |
WHILE|while |
either WHILE or while | |
( any sub-pattern) |
that sub-pattern | |
([A-Z][A-Z])+ |
an even number of upper-case letters | |
pattern? |
optional (but only the immediately preceding pattern!) | |
ab?c |
ac or abc | |
/ pattern |
only if followed by the pattern | |
ab/c |
ab, but only if followed by c | |
ab/\n |
the same as ab$ |
Rule Priority: the earliest rule matching the longest possible string.
- , ; |
the character itself |
"+" , "*" , "/" |
special characters have to be quoted to just recognise them |
[ \t\n] |
white space (ignored) |
[0-9]+ |
one or more numeric characters (a number) |
[^-0-9+*/; \t\n]+ |
any other characters |
ALPHANUMERIC [A-Za-z0-9] ALPHABETIC [A-Za-z] %% {ALPHABETIC}{ALPHANUMERIC}*recognises identifiers
yytext |
contains the actual characters recognised from input |
yyleng |
the number of characters in yytext |
ECHO |
= printf ("%s", yytext) , the default action |
yylex |
routine created by LEX from (expected input, action) lists. |
yywrap |
called at EOF; (boolean) result = terminate or not |
LEX deals with EOF (e.g. control-D) as if with the following rule: | |
EOF if (yywrap( )) return 0; |
|
e.g. use in file inclusion, to return to the original file at the end of the included file. |
LEX has other facilities, but those described above are the most important. Refer to the LEX manual in the departmental library if necessary.
Thus the stack must discriminate between numbers (from sub-expressions) and operators so we can stack operators for later. When we see a number we must check to see if it follows an operator (i.e. there is an operator on top of the stack), in which case it is the end of an expression which we must evaluate and stack the result of, or else it is the start of an expression and we must stack it:
[-+*/] {push_operator (yytext[0]);} [0-9]+ {rhs= atoi(yytext); if (stack_top_is_operator()) { op= pop(); lhs= pop(); rhs= do_operation (lhs, op, rhs); } push_value (rhs); /* lhs of next expression */ }However, the situation is usually more complex than this, as we need to deal with operator precedence and associativity, and brackets, all of which are unnecessary with postfix. For example, with an input of
1 + 2 * 3
, the
code above would calculate the value of 1 + 2
when the 2
was
input, but we should really wait until we have input the next operator to
see if it is of higher precedence (or of equal precedence and right
associativity), as in this case the *
is. (We will need some marker
at the end of the expression, such as ;
, that we can treat as if it
has a lower precedence than any operator.)
Therefore, the characteristics of a more general grammar recogniser than LEX must be:
Another simple design is to use fully bracketed prefix or postfix, which may be slightly easier for the user, as we can check that each operator has the right number of operands.
-
in:
- 1 (signed literal) - B (negation) A - 1 or A - B (subtraction)Therefore (unless someone invents a language which uses 3 different characters) languages do not allow signed literals in expressions but use negation and unsigned literals instead. However, even this is beyond LEX when used by itself, as
-
would have different numbers of operands,
unless (as in SML) we use e.g. ^
for negation.
We would have similar problems with any other overloaded operator.
[^']
(b) ^'
(c) [^'].*
(d) '[^']'
(e) '[^']+'
(f) "[^']+"
1234
, 1234L
, 1234U
, 1234UL
, 123.4
,
1234E-1
, 12.34E+1
, 123F
, 123.4L
, 12E+2F
,
12E-2L
, 0X1ABC
, 0X1ABCU
, 0X1ABCL
, 0X1ABCUL
(any of the alphabetic characters can be lower case as well).'a'
, '\n'
, '\027'
, '\0'
, '\''
,
'\x0f'
, '\\'
Fred
, fred
and FRED
legal,
and if so are they the same identifier or different ones? How about keywords
- are IF
, if
and If
the same? Do you prefer
IF fred
or if FRED
or some other combination (aim for maximum
clarity, not ease of typing)?]
it has to come first in the set, and a -
character
has to come first or last. Why?. {ECHO; printf(" unexpected\n");}
.+ {ECHO; printf(" unexpected\n");}
#
to the end of line#
to #
(including multiple lines)/*
to */
(including multiple lines)/*
to */
(including multiple lines and nested comments)input
and unput
in an action or else use start states, which
you can find out about via the readings (3.9).(
and )
.