Example C program showing nested block structure:
int main (. . .) { /* start of Block1 */ float A; int B, C[10]; . . A; . . B; . . C; { /* start of Block2 */ char P; int Q, R[11]; . . A; . . B; . . C; . . P; . . Q; . . R; } /* end of Block2 */ . . A; . . B; . . C; { /* start of Block3 */ float A[6], C; int D; . . B; . . A; . . C; . . D; } /* end of Block3 */ . . A; . . B; . . C; } /* end of Block1 */Scopes of identifiers:
B = Block1 P, Q, R = Block2 A, C = Block1-Block3 (the first A and C) A, C, D = Block3 (the second A and C)Extents of variables (as accessed by identifiers):
A, B, C = duration of Block1 (the first A and C) P, Q, R = duration of Block2 A, C, D = duration of Block3 (the second A and C)Both scope and extent are nested. If different identifiers had been used for the second A and C, the scope and extent would have seemed to be similar i.e. scope is in some ways a subset of extent. Hence the confusion that often arises between them.
Layout of memory at two times during the run: >
A | B | C | P | Q | R |
A | B | C | A | C | D |
Example C program containing functions:
float A; int B, C[10]; void P1 (int I, int J) {B= I + J;} void P2 (void) { int B, D; . . .; P1 (B, D); } int main (. . .) {. . . P1(C[5], C[6]); P2(); . . .}
Layout of memory at two times during the run: >
A | B | C | I | J | ||
A | B | C | B | D | I | J |
The scope of identifiers is still determined by their textual position within the program text. However, the dynamic nesting of extents no longer follows the static nesting of scopes (e.g. the extent of I and J is sometimes, but not always, dynamically nested inside the extent of B and D). Therefore, the B which is assigned to in P1 is always the B declared in the outermost block (following the static nesting of the scopes), even if the other B (declared in P2) is sometimes interposed in the dynamic nesting. i.e. scope is still in some sense a subset of extent.
Algol 60 is the language that first introduced this concept of block structure. Most modern languages allow both nested blocks and functions. C has nested blocks, but no nested functions. Pascal has nested functions (and procedures), each with its own declarations, but no nested blocks. We can use it to illustrate the use of nested functions, to show the problems they cause for implementors.
procedure MAZE_PROG; var BLOCKED: array[-100..+100, -100..+100] of BOOLEAN; function MAZE(X, Y, F: INTEGER): BOOLEAN; var FOUND: BOOLEAN; D: INTEGER; DELTX, DELTY: -1..+1; begin if BLOCKED[X, Y] then MAZE:= FALSE else if (X=0) and (Y=0) then MAZE:= TRUE else begin D:=F; FOUND:= FALSE; repeat D := (D + 1) mod 4; case D of 0: begin DELTX := 0; DELTY := -1; end; 1: begin DELTX := -1; DELTY := 0; end; 2: begin DELTX := 0; DELTY := +1; end; 3: begin DELTX := +1; DELTY := 0; end; end; FOUND := MAZE(X+DELTX, Y+DELTY, D); until (D=F) or FOUND; MAZE:= FOUND; end; end; begin if MAZE(0, 100, 0) then writeln ("success"); end;
Recursion is really a method of obtaining some extra, secret, memory; in this case for the formal parameters X, Y and F and the local variables FOUND, D, DELTX and DELTY at each level of recursion of MAZE.
Layout of memory at some time during the run:
>
BLOCKED | X,Y,F,Found,D,Deltx,Delty | X,Y,etc... | X,Y,etc... | ... |
Scope: The inside of MAZE is statically nested inside the main program block. Only one set of X, Y, F, FOUND, D, DELTX, and DELTY can be accessed.
Extent: Umpteen sets of variables accessed by X, Y, F, FOUND, D, DELTX and DELTY exist, and their extents are nested inside each other. However, only the newest of them can be accessed using those identifiers: the others are hidden.
For the dynamic environment, we will collect together the locations with a similar extent, again noting that they are nested, and treat them as a level on a stack. These levels are usually known as stack frames (c.f. CS204). In the implementation of these at run-time, each stack frame will include a dynamic link back to the previous stack frame, so we can perform a pop. These dynamic links, together, make up the dynamic chain. (We do not usually need an explicit link from a block back to its parent block or function.)
When we want to access a location of an identifier, we need to find the identifier in the static environment and decide which scope level it is in. If it is in the current level, we have a local access and we just need to find the corresponding location in the current dynamic stack frame.
Otherwise, we are making a non-local access, and we need to work back up the dynamic environment stack to find the correct stack frame. We might hope to be able to count the number of levels we have gone back in the static environment and go back the same number of levels in the dynamic environment. However, as we have seen above, when we have function calls, there can be more nested extents than nested scopes, so this will not work.
Instead, we need an extra static link in each stack frame, that points back to the most recent stack frame for the surrounding scope. These static links, together, make up the static chain.
To find a given identifier, see how many levels of the static stack must be followed in order to reach it and then, starting from the dynamic stack, follow the static chain for that number of levels. The dynamic object currently bound to that identifier will be found in the dynamic environment at the frame thus reached.
On function/block entry, we create a new stack frame corresponding to the declarations of the block being entered, with a dynamic link pointing to the previous stack frame. The static link must point back to the most recent stack frame for the surrounding scope i.e. we obtain it exactly as if making a variable access.
On function exit, replace the current stack frame, using its dynamic link.
Example (very silly) Fibonacci program in Pascal:
. . . procedure fib(function last1, last2: integer); var term: integer; function this: integer; begin this:= term end; begin term:= last1 + last2 {i.e. call the functions}; write(term); fib(this, last1); end; function one: integer; begin one:= 1 end; . . . fib(one, one); . . .Dynamically, umpteen declarations of
this
will exist, alongside umpteen
incarnations of term
. Each this
ought to access its own
term
. To implement this, each incarnation of this
must contain
(as well as a pointer to its code) a private static link to the frame in
which it was born.
Layout of memory at some time during the run:
>
... | one,one | term | this,last1 | term | this,last1 | term | ... |
Thus we must model procedures and functions as dynamic objects by a pair: (static link, code pointer)
Languages providing a heap can be categorised according to how extents are
ended. There may be either an explicit free
command
as in C:
int *pti; pti= malloc(sizeof(int)); /* obtains space for an anonymous int variable on the heap, whose extent starts here */ *pti= some_value; some_int_variable= *pti; free(pti); /* end of extent - do not try to use *pti here! */or a garbage collector as in Algol68:
loc ref int pti; pti := heap int := some value; # obtains space for an anonymous int variable on the heap, whose extent starts here # some int variable := pti; pti := #another# heap int; # end of extent of anonymous variable (because pti now refers to a new anonymous variable, so the old one cannot be accessed) #However, when a heap variable goes out of extent by becoming inaccessible, it does not immediately disappear. Eventually, after enough new heap variables have been generated, the stack and the heap will run into each other (on a virtual address machine the memory allocated to the program will become exhausted), whereupon a garbage collector is called to find all the inaccessible heap variables, which it does as follows: