Next: Syntax-directed Translation Up: CS2111: Design and Implementation Previous: Building a Compiler: recognising Contents

Subsections

Building a Compiler: generating the output

Code generation

input:
enhanced parse trees or equivalent for executable code
output:
object code

Actions can be directly embedded in YACC, or we can be making passes over parse trees created earlier.

code generator tools

There are no tools as ubiquitous and as useful as LEX and YACC, but there are various possibilities which may help:

executor program

Use an existing code generator, or create a simple code generator and write a simple executor program to obey it.
run-time slow
easy to write

Use a simple code generator in the compiler, or even just create parse trees, and run the output through a second translator that creates real machine code:
compilers are easier to write
need to create an extra (back-end) translator
can make one compiler per language and one back-end translator per computer and pair them to make all possible combinations i.e. O(n+m) instead of O(n*m).

automatic generation of back-end translator

Given the formal semantics of the input and output, create the back-end translator automatically: current research, not completely successful yet.

Code optimisation

There are various sorts of code optimisations, some of which (global) normally take place before code generation during repeated scans of the entire program or large parts of the program using e.g. parse trees, and some (peephole) during a single scan of the code, looking at short sequences while or after it is generated.

Both global and peephole optimisations can be further classified into machine dependent (e.g. looking at register usage) and machine independent (e.g. looking at the order of evaluation of expressions) optimisations.

machine dependent peephole

e.g. optimise register loading: keep track of what is in registers and avoid reloading them. e.g. for A= B; C= B; D= A*E;


  acc= B         [acc = B
  acc=> A        [acc = B,A
  acc= B         unnecessary
  acc=> C        [acc = B,A,C
  acc= A         unnecessary
  acc* E         [acc = ?
  acc=> D        [acc = D

machine independent peephole

e.g. constant folding: expressions containing constants can be partially evaluated at compile time. This is especially true of address calculations e.g. A[B+1] - we can add 1 (or whatever) to the address of A at compile time.

machine dependent global

e.g. optimising variable storage: find the most frequently used variables (per program or routine or loop or whatever) and keep them in spare registers.

machine independent global

e.g. loop optimisations:

loop invariants

remove as much calculation as possible from the body of the loop, to be performed once and for all before the loop starts.

induction variables

e.g.


  for (i= 0; i<10; i++) a[i]= 0;
  becomes:
  for (i= 0; i<20; i+=2) *(a+i) /* but i already scaled! */ = 0;
  or:
  for (?= 0; ?<10; ?++) *(a++)= 0;

loop unrolling

e.g.


  for (?= 0; ?<5; ?++) {*(a++)= 0; *(a++)= 0}

Optimisation is expensive (in machine time) so make efficient use of it e.g.:

if the program will be run more than once or twice, or for a long time
optimise most heavily used parts of the program
static - inner loops etc.
dynamic - profiling

Making best use of registers

Although the simple methods described in $\S$ 10 generate the best possible code for simple computers, they do not make best use of sets of registers, as more complex access patterns can be used with a set of registers than with a stack. Also, if different registers are being used, modern computers can usually be executing more than one instruction at once.

e.g. suppose we are using
S+= U*V + X*Y;
to sum values in a loop, and Y is invariant but U, V, and X are changing. We would want to at least store Y and S in registers:


     reg1 <- U                    }          these pairs
     reg2 <- regY * X             } }        of instructions
     reg1 <- reg1 * V               } }      could be
     regS <- reg2 + regS              }      overlapped
     regS <- reg1 + regS

Outline algorithm:

build parse tree
tree-walk(s) to allocate registers, labelling nodes of the tree with register numbers, possibly changing tree to reorder code
tree-walk as in $\S$ 10 to generate the code, but using labels instead of n

Constructing a compiler from tasks: Passes

It is normally best to go for the fewest number of passes, but the analysis may be more complex (e.g. not all things declared before use) or the later phases, particularly code generation and optimisation, may require extra passes, or even some tools we are using (e.g. back-end translators) may require extra passes. We might even use a pipeline if we do not have enough memory to fit the whole compiler in at once. (c.f. $\S$ 8.1)

e.g. 1-pass compiler: use LEX and YACC, write a set of routines to create and use a dictionary, and to generate (and optimise) code, and call the routines from the YACC actions (as we have seen in examples).

e.g. multi-pass compiler: use LEX and YACC and write semantic and code-generation routines as above, but also write tree creation and tree-walking routines. Call the tree-creation routines from YACC, then after yyparse in the main program call the tree-walking routines to perform semantic analysis and code-generation in one or more traversals of the tree.

e.g. multi-pass compiler: create a pipeline, with a separate program for each of the analysis and coding phases.

Exercises

A common way of optimising a for statement is to keep the value of the control variable in a register throughout the loop. What restrictions does the ANSI C standard impose on the control variable to make this possible? Is this possible in C, and if so, how would you implement it?
Investigate ways in which spare data registers could be allocated to frequently used operands. A simple technique might perform a static analysis of the parse tree to find which operands occur most frequently. A better technique might attempt to identify frequently executed parts of the program and concentrate on those.

Readings

Capon & Jinks: chs. 3.4-6, 12.1, 10, 14
Aho, Sethi & Ullman: chs. 2.8-9, 8, 9.9-10, 10.1-2, 11

Next: Syntax-directed Translation Up: CS2111: Design and Implementation Previous: Building a Compiler: recognising Contents

Pete Jinks
1999-09-30