Next: Syntax-directed Translation
Up: CS2111: Design and Implementation
Previous: Building a Compiler: recognising
  Contents
Subsections
Building a Compiler: generating the output
input:
enhanced parse trees or equivalent for executable code
output:
object code
Actions can be directly embedded in YACC, or we can be making
passes over parse trees created earlier.
There are no tools as ubiquitous and as useful as LEX and
YACC, but there are various possibilities which may help:
Use an existing code generator,
or create a simple code generator
and write a simple executor program to obey it.
run-time slow
easy to write
Use a simple code generator in the compiler, or even just create parse
trees, and run the output through a second translator that creates real
machine code:
compilers are easier to write
need to create an extra (back-end) translator
can make one compiler per language and one back-end translator per computer
and pair them to make all possible combinations
i.e. O(n+m) instead of O(n*m).
Given the formal semantics of the input and output, create the back-end
translator automatically: current research, not completely successful yet.
There are various sorts of code optimisations, some of which (global)
normally take place before code generation during repeated scans of the
entire program or large parts of the program
using e.g. parse trees, and some (peephole) during a single scan
of the code, looking at short sequences while or after it is generated.
Both global and peephole optimisations can be further classified into
machine dependent (e.g. looking at register usage) and
machine independent (e.g. looking at the order of evaluation of
expressions) optimisations.
- machine dependent peephole
- e.g. optimise register loading: keep track of what is in registers and avoid
reloading them. e.g. for
A= B; C= B; D= A*E;
acc= B [acc = B
acc=> A [acc = B,A
acc= B unnecessary
acc=> C [acc = B,A,C
acc= A unnecessary
acc* E [acc = ?
acc=> D [acc = D
- machine independent peephole
- e.g. constant folding: expressions containing constants can be partially
evaluated at compile time. This is especially true of address calculations
e.g.
A[B+1]
- we can add 1 (or whatever) to the address of A at
compile time.
- machine dependent global
- e.g. optimising variable storage: find the most frequently used variables
(per program or routine or loop or whatever) and keep them in spare
registers.
- machine independent global
- e.g. loop optimisations:
- loop invariants
- remove as much calculation as possible from the body of the loop, to be
performed once and for all before the loop starts.
- induction variables
- e.g.
for (i= 0; i<10; i++) a[i]= 0;
becomes:
for (i= 0; i<20; i+=2) *(a+i) /* but i already scaled! */ = 0;
or:
for (?= 0; ?<10; ?++) *(a++)= 0;
- loop unrolling
- e.g.
for (?= 0; ?<5; ?++) {*(a++)= 0; *(a++)= 0}
Optimisation is expensive (in machine time) so make efficient use of it e.g.:
- if the program will be run more than once or twice, or for a long time
- optimise most heavily used parts of the program
static - inner loops etc.
dynamic - profiling
Although the simple methods described in 10 generate
the best possible code for simple computers, they do not make best use of
sets of registers, as more complex access patterns can be used with a set of
registers than with a stack. Also, if different registers are being used,
modern computers can usually be executing more than one instruction at once.
e.g. suppose we are using
S+= U*V + X*Y;
to sum values in a loop, and Y is invariant but U, V, and X are changing. We
would want to at least store Y and S in registers:
reg1 <- U } these pairs
reg2 <- regY * X } } of instructions
reg1 <- reg1 * V } } could be
regS <- reg2 + regS } overlapped
regS <- reg1 + regS
Outline algorithm:
- build parse tree
- tree-walk(s) to allocate registers, labelling nodes of the tree with
register numbers, possibly changing tree to reorder code
- tree-walk as in 10 to generate the code, but using
labels instead of
n
It is normally best to go for the fewest number of passes, but the analysis
may be more complex (e.g. not all things declared before use) or the later
phases, particularly code generation and optimisation, may require extra
passes, or even some tools we are using (e.g. back-end translators) may
require extra passes. We might even use a pipeline if we do not have enough
memory to fit the whole compiler in at once.
(c.f. 8.1)
e.g. 1-pass compiler: use LEX and YACC, write a set of
routines to create and use a dictionary, and to generate (and optimise) code,
and call the routines from the YACC actions (as we have seen in
examples).
e.g. multi-pass compiler:
use LEX and YACC and write semantic and code-generation
routines as above, but also write tree creation and tree-walking routines.
Call the tree-creation routines from YACC, then after
yyparse
in the main program call the tree-walking routines to perform
semantic analysis and code-generation in one or more traversals of the tree.
e.g. multi-pass compiler: create a pipeline, with a separate program for
each of the analysis and coding phases.
- A common way of optimising a for statement is to keep the value of the
control variable in a register throughout the loop. What restrictions does
the ANSI C standard impose on the control variable to make this possible?
Is this possible in C, and if so, how would you implement it?
- Investigate ways in which spare data registers could be allocated to
frequently used operands. A simple technique might perform a static analysis
of the parse tree to find which operands occur most frequently. A better
technique might attempt to identify frequently executed parts of the program
and concentrate on those.
Capon & Jinks: chs. 3.4-6, 12.1, 10, 14
Aho, Sethi & Ullman: chs. 2.8-9, 8, 9.9-10, 10.1-2, 11
Next: Syntax-directed Translation
Up: CS2111: Design and Implementation
Previous: Building a Compiler: recognising
  Contents
Pete Jinks
1999-09-30