.NH S 5 Code Generation

General target architecture:

CPU = Small fast memory (registers) + ALU
+
Main Memory (RAM) used to hold Code, Data, Stack, Heap etc.

Compiled code depends on
- kind & size of fast memory
- number of main memory accesses per operation

e.g. A * B + C * D

dc	ARM
_
A	ADD    	a2,sp,#8
B	LDMIA	a2,{a1,a2}
*	MUL    	a2,a1,a2	; a*b
C	LDMIA	sp,{a1,a3}
D
*	MLA    	v1,a3,a1,a2	; + c*d
+

Use of Registers and Main Memory

A simple way of compiling code is to hold all variables in memory at run-time and only load them into the CPU when they are used in an expression, as in the examples above. However, modern high-performance computers and their compilers make extensive use of registers, as they are usually much faster then memory.

MUL	v2,a1,a2	; a*b
MLA	v1,a3,a4,v2	; + c*d
Essentially, any non-constant values are kept in registers until they are all in use, whereupon the least-used values will be discarded (and saved to memory if changed) to free up as many registers as are required for the next few actions. In particular, if a function is very simple and only contains a few variables then these will be held in registers all the time and never need to be stored to memory. However, if the function calls another function, some register contents may have to be stacked by the called function so that it can make use of the registers it needs.

To make good use of the available registers on a particular computer, there will normally be some compiler protocol, that outlines the use to be made of the registers. This is particularly important if we want to be able to call functions written in one language from another language. For example, the simplest version of the ARM Procedure Call Standard (APCS) is:

R0	a1	parameter 1, result, or temporary
R1-R3	a2-a4	parameters 2-4, or temporaries
R4-R10	v1-v7	variables 1-7
R11	fp	frame pointer
R12	ip	temporary
R13	sp	stack pointer (lower end of current stack frame)
R14	lr	link register, or temporary
R15	pc	program counter

When we come to consider data-structures, which require more space than is normally available as registers, we will see compiled code that makes extensive use of memory.

Expressions

General strategy: use registers to hold partial results. When run out use the stack.

e.g. i= x*y - j*k + a*b

ADD    	a2,sp,#8	;	LDR	a1,[sp,#&C]
LDMIA	a2,{a1,a2}	;	LDR	a2,[sp,#&8]
MUL    	a2,a1,a2
LDMIA	sp,{a1,a3}	;	LDR	a3,[sp,#4]
	        	;	LDR	a1,[sp,#0]
MUL    	a3,a1,a3
SUB    	a2,a2,a3
ADD    	a3,sp,#&10	;	LDR	a1,[sp,#&14]
LDMIA	a3,{a1,a3}	;	LDR	a3,[sp,#&10]
MLA    	v1,a3,a1,a2

Booleans | & ! ^ etc.

int a, b, c
c= a | b;

ORR Rc, Ra, Rb

if ((a<b)||c) statement1; else statement2;

	CMP	Ra, Rb
	BLT	then
	TEQ	Rc, #0
	BEQ	else
then	statement1
	B	end
else	statement2
end	. . .

Storage Types (Extent)

There are 4 types of variables

* Global
- declared outside functions.
- exist once in store, access from anywhere

* Static
- declared with the keyword static
- just one copy exists through life of program
- but only accessible locally

* Local
- declared inside a function.
- allocated everytime a routine is entered and exited.
- exists on the runtime stack.

* Dynamic
- allocated through library calls malloc() and free().
- only accessed through pointers.
- program responsible for freeing space.

Stack Frame

Local variables are allocated on stack to allow for recursion.

	void test(int a)
	{
	  static int i;
	  int j,k;
	  . . .
	}
	_
sp ->	k	stack frame
	j	
	old pc	
	a	
	_	
		previous stack frame
* Each time test is called, a fresh stack frame is created and placed onto the runtime stack. The stack pointer sp points at the top of the current frame. All the variables are accessed by relative addressing.

main() { int i=10; f(); }

void f() { int a, b; g(); }

void g() { int x, y; f(); }

_		
b	 frame for f()
a	
old pc	
_	
y	 frame for g()
x	
old pc	
_	
b	 frame for f()
a	
old pc	
_	
i	 frame for main
old pc	
_	

stack after 3 calls

Heap

* access through library functions malloc and free

ptr= malloc(size)	/* sets ptr to a chunk of memory of size 'size' */
free(ptr)		/* frees memory allocated by malloc */

* Allocated on area called heap.

* Repeated malloc/free cycles can cause fragmentation.

 p1= malloc(50);
 p2= malloc(100);
 free(p1);
 p3= malloc(60); 	/* cannot use free-d slot! */

Data structures - Arrays

C arrays go from a[0] to a[length-1]

e.g. int a[10]; . . . a[i]= 3;

	a[9]
	. . .	Address of a[i] = sp + a_offset + i*4
	a[1]
	a[0]	<- sp + a_offset
size = 10 * sizeof(int) = 40

	SUB	sp,sp,#&28		; reserve 40 extra bytes of memory
	. . .
	MOV	a2,#3
	ADD	a3,sp,a_offset
	STR	a2,[a3,a1,LSL #2]	; i in a1

Two-dimensional Arrays

Organised as array of arrays. e.g. int a[10][4]:
a[0], a[1], . . ., a[9] are each int[4] arrays, and laid out in memory as before:

a[0] a[1] . . . a[i] . . . a[9]

a[i][0] a[i][1] a[i][2] a[i][3]

row size = 4 * 4, total size = 10 * row size = 160

address of a[i][j] is sp + a_offset + (i*4 + j) * size_of_int

e.g. a[i][j]= 20;

	SUB	sp,sp,#&a0
	. . .
	MOV	a2,#&14		; i in a3, j in a1
	ADD	a4,sp,a_offset
	ADD	a4,a4,a3,LSL #4
	STR	a2,[a4,a1,LSL #2]

* C does NOT check indices against limits - a common cause of errors.

Pointers

e.g. int i; int *pi; int **ppi;
i= 3
	MOV	a1,#3
	STR	a1,[sp,#4]
pi= &i;
	ADD	a1,sp,#4
	STR	a1,[sp,#0]
ppi= &pi;
	MOV	a2,sp
**ppi= 5;
	MOV	a1,#5
	LDR	a3,[a2,#0]
	STR	a1,[a3,#0]

Control Structures

switch (i) {
  case 1: statement1; break;
  . . .
  case n: statementn; break;
  default: statementd;
}

	CMP	v1,#n		;i in v1
	ADDLS	pc,pc,v1,LSL #2
	B	default		;i>n(unsigned)
	B	default		;i==0
	B	case1
	. . .
	B	casen
case1	statement 1
	B	end
	. . .
casen	statement n
	B	end
default	default statement
end

Loops

while (a<b) {statement}

test	CMP	a1,a2		;a in a1, b in a2
	BGE	end
	statement
	B	test
end

Function Call

caller ( )
{ . . . /*call sequence*/ callee (...) /*post call sequence*/ . . . }

callee (...)
{/*entry sequence*/ . . . /*return sequence*/}

* call sequence
- save registers in use (STMDB sp!,{registers})
- evaluate and pass parameters (ARM uses registers for the first 4 parameters, and stack for any further parameters (in reverse order))

	MOV	a4,sp		;last param
	ADD	a3,sp,#4	
	STMDB	sp!,{a3,a4}
	ADD	a4,sp,#&10	;4th paraam
	ADD	a3,sp,#&14
	ADD	a2,sp,#&18
	ADD	a1,sp,#&1c	;1st param

- branch to function, setting return link (lr)
	BL func_address

* entry sequence
- stack lr (unless leaf function)
- stack registers before use. (STMDB sp!,{registers})
- shift sp to allocate space for local vars.

	MOV	ip,sp
	STMDB	sp!,{a1-a4}
	STMDB	sp!,{v1,fp,ip,lr,pc}
	SUB	fp,ip,#&14
	MOV	ip,a1
	MOV	v1,a4
	LDR	a4,[fp,#&18]
	LDR	lr,[fp,#&14]

* return sequence
- shift sp to deallocate space
- unstack registers (LDMDB sp!,{registers})
- pop lr into pc (MOV pc,lr if leaf function)

	MOV	a1,result
	LDMDB	fp,{v1,fp,sp,pc}

* post call sequence
- recover space for any params on stack
- unstack registers, if any (LDMDB sp!,{registers})

	ADD	sp,sp,#8	;2 words of params on stack
	. . .	a1 . . .

Since the caller and the function are compiled separately, need a convention on how to cooperate in passing parameters and register use.

e.g.

	                	_
	                	params
	                	regs
_	                	_
vars	- call seq ->      	vars
old pc	                	old pc
_	                	_

^	                	|
post call seq	              	entry seq
|	                	v

	                	_
	                	vars
	                	old pc
_	              	
params	                	params
regs	                	regs
_	                	_
vars	<- return seq -    	vars
old pc	                	old pc
_	                	_

Saving Registers

Registers must be saved across function calls. who should do it?

* In call sequence - save all registers currently in use.
* In entry sequence - save only registers to be reused.

ARM APCS uses both strategies. Registers are divided into 3 groups:
* Callee saved - Must be preserved across calls. The called function must not disturb these. (v1-v7)
* Caller saved - Need not be preserved. The called function may disturb these. (a1-a4)
* dedicated registers (fp, sp, lr, pc etc.)