Pete Jinks pjj@cs.man.ac.uk
This course introduces two major components of system software. Peter Capon will discuss operating systems, and I will deal with compilers. In the limited time available, neither of us will be going into very much detail. I will assume some familiarity with ANSI C and the ARM instruction set.
There are several compiler books on the reading list, and many more available in libraries and bookshops, ranging widely in difficulty and depth. If you want to buy a book, I suggest you look at several and pick the one that you find most helpful. Here are some other references that you may find useful:
ARM
* ARM System Architecture
Steve Furber; Addison-Wesley 1996
(mainly chapters 5 & 6)
Lex & Yacc
* Lex and Yacc (2nd ed.)
J.Levine, T.Mason & D.Brown; O'Reilly 1992
* Lex - A Lexical Analyzer Generator
ME Lesk, E Schmidt;
http://www.cs.man.ac.uk/~pjj/cs2111/lex/lex.html
* Yacc: Yet Another Compiler-Compiler
SC Johnson;
http://www.cs.man.ac.uk/~pjj/cs2111/yacc/yacc.html
comp.compilers FAQ includes a reading list about lex, yacc & compilers
lecture Introduction to assemblers & compilers _ lecture Lexical analysis & Lex lab Lex practical _ lecture Syntactic analysis & Yacc lab Yacc practical _ lecture Semantic analysis & Dictionaries lab Code Generation practical _ lecture Code Generation 1 lecture Code Generation 2 lecture Phases & Passes, ReviewEach Compiler session will consist of a lab, or a lecture followed by paper-and-pencil exercises. The exercises and labs will be used both for reinforcement and for further exploration.
The description of the practicals & exercises, a list of common
flex/byacc/make errors and how to correct them, plus extra hints and any
corrections, can be found via URL
http://www.cs.man.ac.uk/~pjj/cs5031/index.html
Assemblers & compilers translate for later execution by real hardware or by
software interpreters. They are application-specific programs just like any
other, best written in HLLs, especially those specific to the application
area
simplifies the task of writing machine code programs
-> error messages, listings etc.
LLL program ->
[ assembler ]
-> binary program
1) build words from characters, discard unimportant spaces & comments
2) check legal statement
3) check user-defined names (e.g. labels),
keep list of names & addresses
4) translate [one-to-one]
Formally:
1) Lexical (word) analysis
2) Syntactic (sentence structure) analysis
3) Semantic (meaning) analysis
4) Code generation
Grammar (Syntax + Lex): Representation
Semantics: meaning
In natural languages, sentences/statements only have meaning in a context.
Computers have no common sense or understanding as to what is going on,
so context has to be carefully defined:
* definitions for single words (identifiers/names)
dictionary: the words of a language alphabetically arranged, with
their meanings (type, location etc.)
* meanings for statements & structures
Makes it seem as if the high-level language is the machine language.
-> error messages, listings etc.
HLL program ->
[ compiler ]
-> LLL or binary program
e.g.
/* division by repeated subtraction */ int main (void) { int ans=-1, a=99, b=6; do {ans=ans+1; a=a-b;} while (a>=0); return ans; }
First 4 steps the same:
1) identify words etc.
e.g
[ int ]
[ main ]
[ ( ]
[ void ]
[ ) ]
[ { ]
[ int ]
[ ans ]
[ = ]
[ \- ]
[ 1 ]
[ , ]
[ a ]
[ = ]
[ 99 ]
[ , ]
[ b ]
[ = ]
[ 6 ]
[ ; ]
[ do ]
[ { ]
[ ans ]
[ = ]
[ ans ]
[ + ]
[ 1 ]
[ ; ]
[ a ]
[ = ]
[ a ]
[ \- ]
[ b ]
[ ; ]
[ } ]
[ while ]
[ ( ]
[ a ]
[ >= ]
[ 0 ]
[ ) ]
[ ; ]
[ return ]
[ ans ]
[ ; ]
[ } ]
2) check against grammar
3) check user-defined names
(declared, var/label, int/real etc.)
keep list of names, addresses, types etc.
4) translate [many-to-many?]
e.g.
main MVN a1, #0 ; a1 = ans MOV a2, #&63 ; a2 = a MOV a3, #6 ; a3 = b L ADD a1, a1, #1 SUBS a2, a2, a3 BPL L MOV pc, lr 00 0xe3e00000 mvn r0, #0 04 0xe3a01063 mov r1, #0x63 08 0xe3a02006 mov r2, #6 0c 0xe2800001 add r0, r0, #1 10 0xe0511002 subs r1, r1, r2 14 0x5afffffc bpl 0xc 18 0xe1a0f00e mov pc, r14
5) Code optimisation - before, during & after (4)
* phase - logical subdivision of the compiling task.
* pass - actual scan of the whole program being translated, that must be completed before the next phase/pass can start.
* Often combine several phases into one pass.
* Phases are usually in linear order:
lexical -> syntax -> semantic -> generate
* Significant optimisation usually requires extra passes.
* Some language features force more than one pass - using names
before definition, if definition can change meaning of use.
* C can be compiled in one pass.
* C uses preprocessing to deal with #defines etc.
Library: increases set of operations available to programmer.
Translate these operations once, separately from user programs.
Linker searches list(s) of names & addresses to locate required operations and combines operations with user programs.
When translating library, don't know where it will end up, so linker also has to relocate the code by changing addresses in it.
library =
exported names & addresses
imported names & where used
relocation information
code
Why interpret?
resources e.g. cost,
size
only execute once (e.g. ksh etc.)
very high level languages - but we get better at compiling & debugging them
interpreters are slow, simple & small
You should bring any reference material you have about ANSI C to the next
CS5031 Compilers lecture. Here are some examples. You do not need to buy
a book if you don't already own one, as the last book is available on-line at
URL
http://www.cs.man.ac.uk/applhax/C/standard_c
* Expert C programming: deep C secrets
P van der Linden;
Prentice Hall, 1994
* C: A reference manual (4th ed.)
SP Harbison, GL Steele Jr.; Prentice Hall, 1995
* Standard C - A Reference
PJ Plauger, J Brodie; Prentice Hall, 1996;