Next: Types Up: CS2111: Design and Implementation Previous: Identifiers: Static and Dynamic Contents

Subsections

Scope and Extent

In early programming languages, identifiers and variables had essentially unlimited scope and extent. However, it soon became clear that it was important to find ways of restricting extent, to make the best use of limited computer memories, and scope, to make the best use of limited human memories.

Classical Block Structure

How can we safely reuse memory space for different purposes in different parts of a program?

Example C program showing nested block structure:


  int main (. . .)
  {                                            /* start of Block1 */
    float A; int B, C[10];
    . . A; . . B; . . C;
    {                                              /* start of Block2 */
      char P; int Q, R[11];
      . . A; . . B; . . C; . . P; . . Q; . . R;
    }                                              /* end of Block2 */
    . . A; . . B; . . C;
    {                                              /* start of Block3 */
      float A[6], C; int D;
      . . B; . . A; . . C; . . D;
    }                                              /* end of Block3 */
    . . A; . . B; . . C;
  }                                            /* end of Block1 */

Scopes of identifiers:


  B        = Block1
  P, Q, R  = Block2
  A, C     = Block1-Block3 (the first A and C)
  A, C, D  = Block3        (the second A and C)

Extents of variables (as accessed by identifiers):


  A, B, C  = duration of Block1 (the first A and C)
  P, Q, R  = duration of Block2
  A, C, D  = duration of Block3 (the second A and C)

Both scope and extent are nested. If different identifiers had been used for the second A and C, the scope and extent would have seemed to be similar i.e. scope is in some ways a subset of extent. Hence the confusion that often arises between them.

Layout of memory at two times during the run:

A B C P Q R

A B C A C D

<>

Example C program containing functions:


        float A; int B, C[10];
        void P1 (int I, int J)
                {B= I + J;}
        void P2 (void)
                { int B, D;
                  . . .; P1 (B, D);
                }
        int main (. . .)
                {. . . P1(C[5], C[6]); P2(); . . .}

Layout of memory at two times during the run:

A B C I J

A B C B D I J

<>

The scope of identifiers is still determined by their textual position within the program text. However, the dynamic nesting of extents no longer follows the static nesting of scopes (e.g. the extent of I and J is sometimes, but not always, dynamically nested inside the extent of B and D). Therefore, the B which is assigned to in P1 is always the B declared in the outermost block (following the static nesting of the scopes), even if the other B (declared in P2) is sometimes interposed in the dynamic nesting. i.e. scope is still in some sense a subset of extent.

Algol 60 is the language that first introduced this concept of block structure. Most modern languages allow both nested blocks and functions. C has nested blocks, but no nested functions. Pascal has nested functions (and procedures), each with its own declarations, but no nested blocks. We can use it to illustrate the use of nested functions, to show the problems they cause for implementors.

Recursion

Example of recursion and functions defined inside functions in Pascal:


procedure MAZE_PROG;
  var BLOCKED: array[-100..+100, -100..+100] of BOOLEAN;
  function MAZE(X, Y, F: INTEGER): BOOLEAN;
  var FOUND: BOOLEAN;
      D: INTEGER;
      DELTX, DELTY: -1..+1;
  begin
    if BLOCKED[X, Y] then MAZE:= FALSE
    else if (X=0) and (Y=0) then MAZE:= TRUE
    else begin
      D:=F; FOUND:= FALSE;
      repeat
        D := (D + 1) mod 4;
        case D of
           0: begin DELTX :=  0; DELTY := -1; end;
           1: begin DELTX := -1; DELTY :=  0; end;
           2: begin DELTX :=  0; DELTY := +1; end;
           3: begin DELTX := +1; DELTY :=  0; end;
        end;
        FOUND := MAZE(X+DELTX, Y+DELTY, D);
      until (D=F) or FOUND;
      MAZE:= FOUND;
    end;
  end;
begin
  if MAZE(0, 100, 0) then writeln ("success");
end;

Recursion is really a method of obtaining some extra, secret, memory; in this case for the formal parameters X, Y and F and the local variables FOUND, D, DELTX and DELTY at each level of recursion of MAZE.

Layout of memory at some time during the run:

BLOCKED X,Y,F,Found,D,Deltx,Delty X,Y,etc... X,Y,etc... ...

<>

Scope: The inside of MAZE is statically nested inside the main program block. Only one set of X, Y, F, FOUND, D, DELTX, and DELTY can be accessed.

Extent: Umpteen sets of variables accessed by X, Y, F, FOUND, D, DELTX and DELTY exist, and their extents are nested inside each other. However, only the newest of them can be accessed using those identifiers: the others are hidden.

The Stack Model

We need to update our semantic model ( $\S$ 13). The static environment changes as we enter and leave each scope. As these scopes are nested, we can treat the static environment as a stack, with a set of declarations being pushed at the start of scope and popped at the end. (We will assume that scopes start and end at a function or block. In most languages there is little difference between the start or end of a scope and of an extent: if a finer granularity is needed, we usually need it for both.)

For the dynamic environment, we will collect together the locations with a similar extent, again noting that they are nested, and treat them as a level on a stack. These levels are usually known as stack frames (c.f. CS204). In the implementation of these at run-time, each stack frame will include a dynamic link back to the previous stack frame, so we can perform a pop. These dynamic links, together, make up the dynamic chain. (We do not usually need an explicit link from a block back to its parent block or function.)

When we want to access a location of an identifier, we need to find the identifier in the static environment and decide which scope level it is in. If it is in the current level, we have a local access and we just need to find the corresponding location in the current dynamic stack frame.

Otherwise, we are making a non-local access, and we need to work back up the dynamic environment stack to find the correct stack frame. We might hope to be able to count the number of levels we have gone back in the static environment and go back the same number of levels in the dynamic environment. However, as we have seen above, when we have function calls, there can be more nested extents than nested scopes, so this will not work.

Instead, we need an extra static link in each stack frame, that points back to the most recent stack frame for the surrounding scope. These static links, together, make up the static chain.

To find a given identifier, see how many levels of the static stack must be followed in order to reach it and then, starting from the dynamic stack, follow the static chain for that number of levels. The dynamic object currently bound to that identifier will be found in the dynamic environment at the frame thus reached.

On function/block entry, we create a new stack frame corresponding to the declarations of the block being entered, with a dynamic link pointing to the previous stack frame. The static link must point back to the most recent stack frame for the surrounding scope i.e. we obtain it exactly as if making a variable access.

On function exit, replace the current stack frame, using its dynamic link.

Function Values

We said above that values not accessible through the static chain were not accessible at all. Actually, means of getting at them can be devised.

Example (very silly) Fibonacci program in Pascal:


  . . .
  procedure fib(function last1, last2: integer);
        var term: integer;
        function this: integer;
                begin this:= term end;
        begin
        term:= last1 + last2 {i.e. call the functions};
        write(term);
        fib(this, last1);
        end;
  function one: integer;
        begin one:= 1 end;
  . . .
  fib(one, one);
  . . .

Dynamically, umpteen declarations of this will exist, alongside umpteen incarnations of term. Each this ought to access its own term. To implement this, each incarnation of this must contain (as well as a pointer to its code) a private static link to the frame in which it was born.

Layout of memory at some time during the run:

... one,one term this,last1 term this,last1 term ...

<>

Thus we must model procedures and functions as dynamic objects by a pair: (static link, code pointer)

The Heap

A heap is a means of explicitly controlling extents.

Languages providing a heap can be categorised according to how extents are ended. There may be either an explicit free command as in C:


  int *pti;
  pti= malloc(sizeof(int));
        /* obtains space for an anonymous int variable
           on the heap, whose extent starts here */
  *pti= some_value;
  some_int_variable= *pti;
  free(pti);
        /* end of extent - do not try to use *pti here! */

or a garbage collector as in Algol68:


  loc ref int pti;
  pti := heap int := some value;
        # obtains space for an anonymous int variable
          on the heap, whose extent starts here #
  some int variable := pti;
  pti := #another# heap int;
        # end of extent of anonymous variable (because pti now refers to
          a new anonymous variable, so the old one cannot be accessed) #

However, when a heap variable goes out of extent by becoming inaccessible, it does not immediately disappear. Eventually, after enough new heap variables have been generated, the stack and the heap will run into each other (on a virtual address machine the memory allocated to the program will become exhausted), whereupon a garbage collector is called to find all the inaccessible heap variables, which it does as follows:

find all the pointers in live, named variables
find all the heap values pointed to thereby, and mark them
if these contain pointers to other heap values, mark them also,
and so on
all unmarked values on the heap are garbage
compact the heap to one end, adjusting pointers accordingly

Exercises

( $\dag$ indicates harder problems)

$\dag$ When using static and dynamic chains, what language features would force you to use a new link when entering a block, rather than only when entering a function?
$\dag$ Static chains connect the set of stack frames in each dynamic environment as a tree. Access to a particular stack frame in the static environment means we must follow the links in the current chain, so access is O(number of statically nested frames).
An alternative algorithm is to use a display, which is cheaper at each access, but which is harder to keep correct as the static and dynamic environments change. The idea is to keep an array (the display), whose elements are copies of the static links for the current static environment (i.e. the static chain that links together the currently accessible stack frames). As these links are in an array rather than a list, access is O(1).
Using the examples above, draw the display required for each stack frame, and so design an algorithm that will maintain the display correctly on function entry and exit. There is more than one possibility (e.g. does each stack frame have its own display, or is there one global display). List all those you can think of, and try to analyse their relative merits. How would you implement a function passed as a parameter?

Readings

Louden: chs. 5.6, 7.5
Aho, Sethi & Ullman: chs. 7.2-7.4, 7.6-7.9, 9.3

Next: Types Up: CS2111: Design and Implementation Previous: Identifiers: Static and Dynamic Contents

Pete Jinks
1999-09-30