General target architecture:
CPU = Small fast memory (registers) + ALU
+
Main Memory (RAM) used to hold Code, Data, Stack, Heap etc.
Compiled code depends on
- kind & size of fast memory
- number of main memory accesses per operation
e.g. A * B + C * D
dc ARM _ A ADD a2,sp,#8 B LDMIA a2,{a1,a2} * MUL a2,a1,a2 ; a*b C LDMIA sp,{a1,a3} D * MLA v1,a3,a1,a2 ; + c*d +
A simple way of compiling code is to hold all variables in memory at run-time and only load them into the CPU when they are used in an expression, as in the examples above. However, modern high-performance computers and their compilers make extensive use of registers, as they are usually much faster then memory.
MUL v2,a1,a2 ; a*b MLA v1,a3,a4,v2 ; + c*dEssentially, any non-constant values are kept in registers until they are all in use, whereupon the least-used values will be discarded (and saved to memory if changed) to free up as many registers as are required for the next few actions. In particular, if a function is very simple and only contains a few variables then these will be held in registers all the time and never need to be stored to memory. However, if the function calls another function, some register contents may have to be stacked by the called function so that it can make use of the registers it needs.
To make good use of the available registers on a particular computer, there will normally be some compiler protocol, that outlines the use to be made of the registers. This is particularly important if we want to be able to call functions written in one language from another language. For example, the simplest version of the ARM Procedure Call Standard (APCS) is:
R0 a1 parameter 1, result, or temporary R1-R3 a2-a4 parameters 2-4, or temporaries R4-R10 v1-v7 variables 1-7 R11 fp frame pointer R12 ip temporary R13 sp stack pointer (lower end of current stack frame) R14 lr link register, or temporary R15 pc program counter
When we come to consider data-structures, which require more space than is normally available as registers, we will see compiled code that makes extensive use of memory.
General strategy: use registers to hold partial results. When run out use the stack.
e.g. i= x*y - j*k + a*b
ADD a2,sp,#8 ; LDR a1,[sp,#&C] LDMIA a2,{a1,a2} ; LDR a2,[sp,#&8] MUL a2,a1,a2 LDMIA sp,{a1,a3} ; LDR a3,[sp,#4] ; LDR a1,[sp,#0] MUL a3,a1,a3 SUB a2,a2,a3 ADD a3,sp,#&10 ; LDR a1,[sp,#&14] LDMIA a3,{a1,a3} ; LDR a3,[sp,#&10] MLA v1,a3,a1,a2
int a, b, c c= a | b;
ORR Rc, Ra, Rb
if ((a<b)||c) statement1; else statement2;
CMP Ra, Rb BLT then TEQ Rc, #0 BEQ else then statement1 B end else statement2 end . . .
There are 4 types of variables
* Global
- declared outside functions.
- exist once in store, access from anywhere
* Static
- declared with the keyword static
- just one copy exists through life of program
- but only accessible locally
* Local
- declared inside a function.
- allocated everytime a routine is entered and exited.
- exists on the runtime stack.
* Dynamic
- allocated through library calls malloc() and free().
- only accessed through pointers.
- program responsible for freeing space.
Local variables are allocated on stack to allow for recursion.
void test(int a) { static int i; int j,k; . . . }
_ sp -> k stack frame j old pc a _ previous stack frame* Each time test is called, a fresh stack frame is created and placed onto the runtime stack. The stack pointer sp points at the top of the current frame. All the variables are accessed by relative addressing.
main() { int i=10; f(); }
void f() { int a, b; g(); }
void g() { int x, y; f(); }
_ b frame for f() a old pc _ y frame for g() x old pc _ b frame for f() a old pc _ i frame for main old pc _stack after 3 calls
* access through library functions malloc and free
ptr= malloc(size) /* sets ptr to a chunk of memory of size 'size' */ free(ptr) /* frees memory allocated by malloc */
* Allocated on area called heap.
* Repeated malloc/free cycles can cause fragmentation.
p1= malloc(50); p2= malloc(100); free(p1); p3= malloc(60); /* cannot use free-d slot! */
C arrays go from a[0] to a[length-1]
e.g. int a[10]; . . . a[i]= 3;
a[9] . . . Address of a[i] = sp + a_offset + i*4 a[1] a[0] <- sp + a_offsetsize = 10 * sizeof(int) = 40
SUB sp,sp,#&28 ; reserve 40 extra bytes of memory . . . MOV a2,#3 ADD a3,sp,a_offset STR a2,[a3,a1,LSL #2] ; i in a1
Organised as array of arrays. e.g. int a[10][4]:
a[0], a[1], . . ., a[9] are each int[4] arrays, and laid out in memory as
before:
a[0] a[1] . . . a[i] . . . a[9]
a[i][0] a[i][1] a[i][2] a[i][3]
row size = 4 * 4, total size = 10 * row size = 160
address of a[i][j] is sp + a_offset + (i*4 + j) * size_of_int
e.g. a[i][j]= 20;
SUB sp,sp,#&a0 . . . MOV a2,#&14 ; i in a3, j in a1 ADD a4,sp,a_offset ADD a4,a4,a3,LSL #4 STR a2,[a4,a1,LSL #2]
* C does NOT check indices against limits - a common cause of errors.
e.g. int i; int *pi; int **ppi; i= 3 MOV a1,#3 STR a1,[sp,#4] pi= &i; ADD a1,sp,#4 STR a1,[sp,#0] ppi= π MOV a2,sp **ppi= 5; MOV a1,#5 LDR a3,[a2,#0] STR a1,[a3,#0]
switch (i) { case 1: statement1; break; . . . case n: statementn; break; default: statementd; }
CMP v1,#n ;i in v1 ADDLS pc,pc,v1,LSL #2 B default ;i>n(unsigned) B default ;i==0 B case1 . . . B casen case1 statement 1 B end . . . casen statement n B end default default statement end
while (a<b) {statement}
test CMP a1,a2 ;a in a1, b in a2 BGE end statement B test end
caller ( ) { . . . /*call sequence*/ callee (...) /*post call sequence*/ . . . } callee (...) {/*entry sequence*/ . . . /*return sequence*/}
* call sequence
- save registers in use (STMDB sp!,{registers})
- evaluate and pass parameters (ARM uses registers for the first 4
parameters, and stack for any further parameters (in reverse order))
MOV a4,sp ;last param ADD a3,sp,#4 STMDB sp!,{a3,a4} ADD a4,sp,#&10 ;4th paraam ADD a3,sp,#&14 ADD a2,sp,#&18 ADD a1,sp,#&1c ;1st param
BL func_address
* entry sequence
- stack lr (unless leaf function)
- stack registers before use. (STMDB sp!,{registers})
- shift sp to allocate space for local vars.
MOV ip,sp STMDB sp!,{a1-a4} STMDB sp!,{v1,fp,ip,lr,pc} SUB fp,ip,#&14 MOV ip,a1 MOV v1,a4 LDR a4,[fp,#&18] LDR lr,[fp,#&14]
* return sequence
- shift sp to deallocate space
- unstack registers (LDMDB sp!,{registers})
- pop lr into pc (MOV pc,lr if leaf function)
MOV a1,result LDMDB fp,{v1,fp,sp,pc}
* post call sequence
- recover space for any params on stack
- unstack registers, if any (LDMDB sp!,{registers})
ADD sp,sp,#8 ;2 words of params on stack . . . a1 . . .
Since the caller and the function are compiled separately, need a convention on how to cooperate in passing parameters and register use.
e.g.
_ params regs _ _ vars - call seq -> vars old pc old pc _ _ ^ | post call seq entry seq | v _ vars old pc _ params params regs regs _ _ vars <- return seq - vars old pc old pc _ _
Registers must be saved across function calls. who should do it?
* In call sequence - save all registers currently in use.
* In entry sequence - save only registers to be reused.
ARM APCS uses both strategies. Registers are divided into 3 groups:
* Callee saved
- Must be preserved across calls. The called function must not disturb these. (v1-v7)
* Caller saved
- Need not be preserved. The called function may disturb these. (a1-a4)
* dedicated registers (fp, sp, lr, pc etc.)