next up previous contents
Next: Phases of Compilation Up: CS2111: Design and Implementation Previous: Building a Compiler: generating   Contents


Syntax-directed Translation

Syntax-directed translation is essentially a formal treatment of two simple concepts:

Attributes are essentially equivalent to the %union value that yacc can calculate in the action associated with each grammar rule - we can use struct values if there is more than one attribute. However, the actions themselves are generalised - yacc values can only move from leaves of the parse tree towards the root, but in syntax-directed translation the values can move around the tree in any way the user requires. This is achieved by automatically creating a parse tree and traversing it as required to move the values around.

The automatic creation and use of parse trees also simplifies one more step in the translation process, as we often need to do this one way or another.

Systems like yacc and lex exist that can generate syntax-directed translators, but they are inevitably harder to learn than yacc and lex, although like any other high-level language, experienced users can achieve a lot more. However, just as with low-level languages, with sufficient ingenuity, yacc and lex do anything that the more complex systems can do. (Note that some of the tools being created today, such as JavaCC, if anything have less functionality than lex and yacc.) More importantly, by becoming aware of these more advanced concepts, we can make better use of lex and yacc. I will introduce a few such ideas below.

We have looked at various tools and techniques used in compilers, but:

When to use a tool or technique

The questions relevant to the use of lex, yacc, and dictionaries are all about the form of the input text. By contrast, the most important questions relevant to the use of parse trees are about what you want to do with the input text once you have analysed it.

How to combine tools and techniques - Passes

A pass is an actual scan of the whole program being translated, that must be completed before the next phase/pass can start.

A typical translator program will usually combine several phases into each pass. The first pass in any translator is (by definition) over the actual input text. If there are any subsequent passes, they may also be over the input text, but more usually they are over some internal form of the text, such as parse trees.

Even complex languages like C can be compiled in one pass, and one pass translators are usually simplest to write. However, if we have decided above that we have to use parse trees, then by definition we are using more than one pass. Thus, most compilers nowadays are multi-pass as they include significant code optimisation.

A typical 1-pass organisation

The heart of the translator will be the syntactic analyser, written using yacc or something equivalent. This will make calls on the lexical analyser, written using lex or some equivalent. The actions associated with each grammar rule in yacc (and/or in lex) will make calls on routines that do all the rest of the translation, such as build and access the dictionary, perform semantic analysis, and generate any output (assuming these are all relevant).

A typical multi-pass organisation

The first pass will be similar to the 1-pass organisation, except that the generated output will be the parse tree (so the actions will need to call tree-building routines), and by definition not all of the routines in the rest of the translator will have been called.

The tree will then be handed on to the second pass, which will be organised around a tree-walking routine. Typically, this will be a depth-first, left-to-right traversal over the whole tree, but circumstances may require the nodes of the tree to be visited in a different order, and some of them may even be visited more than once or not at all. The tree-walking routine will make calls on some more of the routines from the rest of the translator as it visits each node.

If this is a 2-pass program, we are done. Otherwise, the output from the second pass, which may be a modified tree or may be some other data structure altogether, is handed on to the third pass, which is again organised around a routine that walks over the data-structure (e.g. modified parse tree), calling some more of the routines in the rest of the translator.

If we have completed all the passes, we are done. Otherwise, the next pass will be organised in a similar way to the previous pass, although the data structure handed on may be changed yet again. This process continues until all the passes have been performed.

The net effect is, that during each pass, we calculate more of the attributes, until by end of the last pass we have completed all of them and the final result has been output. Note, however, that the ``parse tree'' may be completely unrecognisable as such by the time that we have finished!

next up previous contents
Next: Phases of Compilation Up: CS2111: Design and Implementation Previous: Building a Compiler: generating   Contents
Pete Jinks