.NH S 0 CS5031 - System Software - Compilers

Pete Jinks pjj@cs.man.ac.uk

This course introduces two major components of system software. Peter Capon will discuss operating systems, and I will deal with compilers. In the limited time available, neither of us will be going into very much detail. I will assume some familiarity with ANSI C and the ARM instruction set.

Reading List

There are several compiler books on the reading list, and many more available in libraries and bookshops, ranging widely in difficulty and depth. If you want to buy a book, I suggest you look at several and pick the one that you find most helpful. Here are some other references that you may find useful:

ARM
* ARM System Architecture
Steve Furber; Addison-Wesley 1996
(mainly chapters 5 & 6)

Lex & Yacc
* Lex and Yacc (2nd ed.)

J.Levine, T.Mason & D.Brown; O'Reilly 1992
* Lex - A Lexical Analyzer Generator
ME Lesk, E Schmidt;
http://www.cs.man.ac.uk/~pjj/cs2111/lex/lex.html

* Yacc: Yet Another Compiler-Compiler
SC Johnson;
http://www.cs.man.ac.uk/~pjj/cs2111/yacc/yacc.html

comp.compilers FAQ includes a reading list about lex, yacc & compilers

Compiler Sessions

lecture	Introduction to assemblers & compilers
_
lecture	Lexical analysis & Lex
lab	Lex practical
_
lecture	Syntactic analysis & Yacc
lab	Yacc practical
_
lecture	Semantic analysis & Dictionaries
lab	Code Generation practical
_
lecture	Code Generation 1
lecture	Code Generation 2
lecture	Phases & Passes, Review
Each Compiler session will consist of a lab, or a lecture followed by paper-and-pencil exercises. The exercises and labs will be used both for reinforcement and for further exploration.

The description of the practicals & exercises, a list of common flex/byacc/make errors and how to correct them, plus extra hints and any corrections, can be found via URL
http://www.cs.man.ac.uk/~pjj/cs5031/index.html

CS5031 - Introduction to Assemblers & Compilers


Assemblers & compilers translate for later execution by real hardware or by software interpreters. They are application-specific programs just like any other, best written in HLLs, especially those specific to the application area

Translation

Assembler

simplifies the task of writing machine code programs


-> error messages, listings etc.
LLL program -> [ assembler ]

-> binary program

1) build words from characters, discard unimportant spaces & comments

2) check legal statement

3) check user-defined names (e.g. labels),
keep list of names & addresses
4) translate [one-to-one]

Formally:
1) Lexical (word) analysis
2) Syntactic (sentence structure) analysis
3) Semantic (meaning) analysis
4) Code generation

Language definitions

Grammar (Syntax + Lex): Representation

Semantics: meaning
In natural languages, sentences/statements only have meaning in a context.
Computers have no common sense or understanding as to what is going on, so context has to be carefully defined:
* definitions for single words (identifiers/names)
dictionary: the words of a language alphabetically arranged, with their meanings (type, location etc.)
* meanings for statements & structures

Compiler

Makes it seem as if the high-level language is the machine language.

-> error messages, listings etc.
HLL program -> [ compiler ]
-> LLL or binary program

e.g.

	/* division by repeated subtraction */
	int main (void)
	{
	  int ans=-1, a=99, b=6;
	  do
	    {ans=ans+1; a=a-b;}
	  while (a>=0);
	  return ans;
	}

First 4 steps the same:
1) identify words etc. e.g
[ int ] [ main ] [ ( ] [ void ] [ ) ] [ { ] [ int ] [ ans ] [ = ] [ \- ]
[ 1 ] [ , ] [ a ] [ = ] [ 99 ] [ , ] [ b ] [ = ] [ 6 ] [ ; ] [ do ] [ { ]
[ ans ] [ = ] [ ans ] [ + ] [ 1 ] [ ; ] [ a ] [ = ] [ a ] [ \- ] [ b ] [ ; ]
[ } ] [ while ] [ ( ] [ a ] [ >= ] [ 0 ] [ ) ] [ ; ] [ return ] [ ans ]
[ ; ] [ } ]
2) check against grammar
3) check user-defined names
(declared, var/label, int/real etc.)
keep list of names, addresses, types etc.
4) translate [many-to-many?]
e.g.

		main	MVN	a1, #0       	; a1 = ans
			MOV	a2, #&63     	; a2 = a
			MOV	a3, #6       	; a3 = b
		L	ADD	a1, a1, #1
			SUBS	a2, a2, a3
			BPL	L
			MOV	pc, lr

00	0xe3e00000	mvn	r0, #0
04	0xe3a01063	mov	r1, #0x63
08	0xe3a02006	mov	r2, #6
0c	0xe2800001	add	r0, r0, #1
10	0xe0511002	subs	r1, r1, r2
14	0x5afffffc	bpl	0xc
18	0xe1a0f00e	mov	pc, r14

5) Code optimisation - before, during & after (4)

Phases & Passes

* phase - logical subdivision of the compiling task.

* pass - actual scan of the whole program being translated, that must be completed before the next phase/pass can start.


* Often combine several phases into one pass.
* Phases are usually in linear order:
lexical -> syntax -> semantic -> generate
* Significant optimisation usually requires extra passes.
* Some language features force more than one pass - using names before definition, if definition can change meaning of use.
* C can be compiled in one pass.
* C uses preprocessing to deal with #defines etc.

Libraries & Linker

Library: increases set of operations available to programmer.


Translate these operations once, separately from user programs.

Include list of operation names & addresses.

Linker searches list(s) of names & addresses to locate required operations and combines operations with user programs.


When translating library, don't know where it will end up, so linker also has to relocate the code by changing addresses in it.

library =
exported names & addresses
imported names & where used
relocation information
code

Execution: Interpreter

Why interpret?
resources e.g. cost, size
only execute once (e.g. ksh etc.)
very high level languages - but we get better at compiling & debugging them

interpreters are slow, simple & small

For next lecture

You should bring any reference material you have about ANSI C to the next CS5031 Compilers lecture. Here are some examples. You do not need to buy a book if you don't already own one, as the last book is available on-line at URL
http://www.cs.man.ac.uk/applhax/C/standard_c


* Expert C programming: deep C secrets

P van der Linden;
Prentice Hall, 1994
* C: A reference manual (4th ed.)

SP Harbison, GL Steele Jr.; Prentice Hall, 1995
* Standard C - A Reference

PJ Plauger, J Brodie; Prentice Hall, 1996;