UP PREVIOUS NEXT

Left Context Sensitivity.

Sometimes it is desirable to have several sets of lexical rules to be applied at different times in the input. For example, a compiler preprocessor might distinguish preprocessor statements and analyze them differently from ordinary statements. This requires sensitivity to prior context, and there are several ways of handling such problems. The ^ operator, for example, is a prior context operator, recognizing immediately preceding left context just as $ recognizes immediately following right context. Adjacent left context could be extended, to produce a facility similar to that for adjacent right context, but it is unlikely to be as useful, since often the relevant left context appeared some time earlier, such as at the beginning of a line.

This section describes three means of dealing with different environments: a simple use of flags, when only a few rules change from one environment to another, the use of start conditions on rules, and the possibility of making multiple lexical analyzers all run together. In each case, there are rules which recognize the need to change the environment in which the following input text is analyzed, and set some parameter to reflect the change. This may be a flag explicitly tested by the user's action code; such a flag is the simplest way of dealing with the problem, since Lex is not involved at all. It may be more convenient, however, to have Lex remember the flags as initial conditions on the rules. Any rule may be associated with a start condition. It will only be recognized when Lex is in that start condition. The current start condition may be changed at any time. Finally, if the sets of rules for the different environments are very dissimilar, clarity may be best achieved by writing several distinct lexical analyzers, and switching from one to another as desired.

Consider the following problem: copy the input to the output, changing the word magic to first on every line which began with the letter a, changing magic to second on every line which began with the letter b, and changing magic to third on every line which began with the letter c. All other words and all other lines are left unchanged.

These rules are so simple that the easiest way to do this job is with a flag:

	int flag;
%%
^a	{flag = 'a'; ECHO;}
^b	{flag = 'b'; ECHO;}
^c	{flag = 'c'; ECHO;}
\n	{flag =  0 ; ECHO;}
magic	{
	switch (flag)
	{
	case 'a': printf("first"); break;
	case 'b': printf("second"); break;
	case 'c': printf("third"); break;
	default: ECHO; break;
	}
	}
should be adequate.

To handle the same problem with start conditions, each start condition must be introduced to Lex in the definitions section with a line reading

%Start	name1 name2 ...
where the conditions may be named in any order. The word Start may be abbreviated to s or S. The conditions may be referenced at the head of a rule with the <> brackets:
<name1>expression
is a rule which is only recognized when Lex is in the start condition name1. To enter a start condition, execute the action statement
BEGIN name1;
which changes the start condition to name1. To resume the normal state,
BEGIN 0;
resets the initial condition of the Lex automaton interpreter. A rule may be active in several start conditions:
<name1,name2,name3>
is a legal prefix. Any rule not beginning with the <> prefix operator is always active.

The same example as before can be written:

%START AA BB CC
%%
^a	{ECHO; BEGIN AA;}
^b	{ECHO; BEGIN BB;}
^c	{ECHO; BEGIN CC;}
\n	{ECHO; BEGIN 0;}
<AA>magic	printf("first");
<BB>magic	printf("second");
<CC>magic	printf("third");
where the logic is exactly the same as in the previous method of handling the problem, but Lex does the work rather than the user's code.


UP PREVIOUS NEXT