e.g. a simple fast compiler generating poor code v. a complex slow compiler generating better code.
e.g. interpreting a command by calling an existing general routine with a particular set of parameters v. compiling and executing a command by translating (binding) it into a specific sequence of instructions which runs many times faster. (e.g. a SH script v. a c program)
We will now look in outline at how we would build a complete compiler for a high-level language like C. To start with, however, it will help if we briefly look again at the subject of grammars.
More importantly, he showed that Turing Machines are equivalent to Unrestricted grammars, and Finite Turing Machines to Context Sensitive grammars. We also know that programming languages are equivalent in power to (finite?) turing machines, so we can deduce that a context free grammar is unlikely to be sufficiently powerful to describe a programming language. However, it is also true that using an unrestricted or context sensitive grammar to describe a programming language would be pointless, as such descriptions are too complicated to be useful.
What we can usefully conclude is that the levels in Chomsky's hierarchy are equivalent to components of the specification of any programming language (and are of fundamentally different complexities). Therefore, to process any such language efficiently we must separate out these components and process them differently. Further, we must be careful when separating the components to maximise the simpler tasks and minimise the more complex tasks. (Thus programming languages should be designed so that this is straightforward, or else users may face an unnecessary penalty.)
The part of a programming language that should be described by a context-sensitive grammar is the context i.e. the identifiers, which we have seen we can process using a dictionary. This processing (of identifiers, types etc.) is normally called semantic analysis. Similarly, that part that corresponds to a context-free grammar can be handled by syntactic analysers such as those generated by YACC, and the part that corresponds to a regular grammar can be handled by lexical analysers such as those generated by LEX.
Note that although we do not normally use a grammar to define the semantics of a programming language, we really need some formal notation (e.g. VDM), and although it is difficult to automatically generate semantic analysers, it is impossible without such a notation.
fred
in freda
, but has no concept of scope except the end of
the text (including any #include
d files).
Recognises simplest pieces (i.e. regular part) of grammar:
discard blanks and comments
identify literals, names, keywords, operators, punctuation
as lexemes = (syntactic kind, semantic value)
lexeme kinds are:
Example fragment of C:
int i; float j; float f (int i) {int k; float r; k= i*j; r= k; return r;}and corresponding lexemes:
(int, -) (id, *) -> i in namelist (;, -) (float, -) (id, *) -> j in namelist (;, -)and namelist:
int float f i j k rThe need for error messages means that we may need a line buffer, so we can output complete lines containing an error and a pointer to where-abouts on the line the error is. (We can do this using some of the more advanced facilities of LEX). However, more sophisticated programming environments can read the error messages and enter an editor, setting the cursor to the position of each error in turn, and so making a line buffer superfluous.
Recognises context-free grammar.
Builds structure from string of symbols.
Detects grammatical errors, but usually cannot correct them.
Good parsers will usually try to continue after finding one error, in case there are more errors.
Possible parse tree for example C program:
program | decl--------var | [int, *->i] | decl--------var | [float, *->j] | decl--------func--------+----------------------+ [float, *->f] decl-------param exp--------[=] | [int,*->i] | [*->k] [*] | | [*->i] [*->j] decl-------var | | [int,*->k] exp--------[=] | | [*->r] [*->k] decl-------var | [float,*->r] return-----[*->r](as in lexemes, identifiers are pointers to entries in the namelist.)
Semantic actions can be directly embedded in the parser (1 pass), or the semantic analyser can walk over parse trees created earlier (multi-pass).
The semantic analyser recognises and checks the declarations and use of identifiers, using the dictionary.
Unlike lexical and syntactic analysis, there are no widely available toolkits for semantic analysis, although there are packages that help with maintaining a dictionary.
detail depends on eagerness (c.f. 19.5)
detail depends on strength of type checking (c.f. 15.2), which may vary
Possible enhanced parse tree for example C program:
program[2 vars] | decl--func----+ [*->f] exp--------[I,=] | [F,->I] | [I,*->k] [F,*] | [I->F] | [I,*->i] [F,*->j] | exp--------[F,=] | [I->F] | [F,*->r] [I,*->k] | return-----[F,*->r](identifiers have become pointers to property entries.) (F short for float, I for int)
Possible dictionary for example C program:
int [type | int] <--I float [type | float] <--F f [func | global | ? | 1 param | 2 vars | *->F] i [param | local | 1 | *->I] [var | global | 1 | *->I] j [var | global | 2 | *->F] k [var | local | 1 | *->I] r [var | local | 2 | *->F]