Next: Scope and Extent Up: CS2111: Design and Implementation Previous: Procedures and functions Contents

Subsections

Identifiers: Static and Dynamic Semantics

Semantic Model

Users and implementors of a language need to understand exactly what each construct in a given programming language means i.e. how it interacts with the underlying hardware. However, different computers have different idiosyncrasies so language designers often describe the meaning in terms of a simplified model.

Computer memory is linear, consisting of consecutively numbered words, each of precisely the same length. We do not really care whether two variables are stored adjacently or not (except possibly for elements of arrays), nor how many bits they occupy.

In our model, there are things called locations into which values may be put, and which are big enough to hold whatever we want to put into them. The only important property about locations is that they are all disjoint.

The actual hardware stores our program, as machine code, in the same memory as our data. Our model does not need to know if or how the program is stored, unless we have a language in which programs can modify themselves. We will consider a program to be a text, containing useful things like identifiers, which does not need to be stored in the actual hardware (we will ignore debuggers, such as dbxtool, so identifiers are useless at run-time). Neverless, our model will talk about the program text, and in particular about identifiers, while we describe what is supposed to happen at run-time. Thus, we can talk about the location known to the program as fred, even though the identifier fred will have been completely eliminated by the time the program runs on the actual hardware.

Static and dynamic properties

We can distinguish between the static (external) world, representing the program text, which does not change, and the dynamic (internal) world, representing the hardware at run-time, which is where it all happens.
Static objects are constructs (identifiers, statements, expressions etc.) in the text of the program, and have no meaningful existence beyond compile-time.
Dynamic objects are (instances of) values, locations and the like, which live and move and have their being inside the computer at run-time.

A correspondence between some static objects and their dynamic counterparts may be established by binding, brought about as a consequence of a declaration.

Static objects depend for their meaning upon their static environment, which knowns all about the other relevant static objects which surround it.

In particular, the static environment must include what is known about each identifier from its declaration. In fact, the static environment maps each identifier to its type and the kind of declaration it came from.

The static environment is invariant over time, but varies according to position within the program text.

The dynamic environment relates identifiers to the dynamic objects that will be around at run-time i.e. it maps each identifier to information about constants or variables or operations etc. It will vary over time, as the program runs.

(Static) Scope and (Dynamic) Extent

Scope is a property of the static world (i.e. of the written program, so that it all gets sorted out at compile time).
An identifier which is declared at some point in a program (the defining occurrence) may be used, with its declared meaning, at other points in the program (applied occurrences). The scope of the identifier is the region of the text of that program in which those applied occurrences may occur.

The static environment must change at the start of the scope of an identifier and again at the end of that scope. In some languages many identifiers start their scope at exactly the same place (e.g. the start of a function), in others each declaration starts a fresh scope. A scope may continue until the end of the subprogram or block containing the declaration, or whatever. In most languages the same identifier can be used in several declarations, each with its own scope.

Extent is the corresponding property of the dynamic world (i.e. it gets sorted out at run time).
If a location(s) for a variable is provided at some time, and if later on that location(s) is taken away again (perhaps for use by a different variable), then the interval of time over which the location(s) for that variable existed is the extent of that variable.
The dynamic environment must change when that variable appears and again when it vanishes. It may be that a variable, once allocated, remains around until the end of the program (its extent is the lifetime of the program), or it may be that the variable is only around during the running of some procedure (its extent is the lifetime of that procedure). If such a procedure manages to call itself recursively, then there may be several such variables created, each with its own extent (and these extents are nested inside each other).

In simpler languages (such as PASCAL), the extent of a variable corresponds so closely to the scope of the identifier which accesses it that language manuals sometimes do not distinguish between the two concepts.

Declarations

Associate a type and kind with an identifier, as then recorded in the static environment, and
Bind the static identifier to a dynamic object, to be later recorded in the dynamic_environment, so the identifier then accesses the value of that object (except for type declarations, as types are purely static objects).

Why bother with declarations:

to associate the correct type with the identifier (less important for untyped languages, or those with implicit types)
to spot spelling mistakes (absolutely crucial) (c.f. $\S$ 9.1)

There are two different kinds of declaration:

Variables: which may be assigned to and re-assigned to. Often, the value to be (initially) associated with the variable is not provided at declaration time, but by a later assignment.
Constants: where the value accessed by the identifier may never change (except by declaring it again with a new value), so it may not be assigned to. If its value can be computed at compile-time, it is said to be a manifest constant.

A constant is a dynamic object; a value which has been bound to an identifier during some period of time.

Do not confuse the dynamic constant with its static counterpart literal; 0 is a literal and zero is an identifier.

Most languages allow constant declarations. Of these, some allow only literal values, some allow constants, some allow expressions involving literals and constants, and some allow any expression so that the values can only be calculated at run-time.

Similarly, with variable declarations, some languages do not allow initialisation within the declaration, some allow literals, and so on (and also for array bounds).

For constant declarations, and initialisations in variable declarations, some languages restrict the types to predefined types, or to scalars, but some permit strings or arrays or any composite types.

Variables

A variable is a dynamic object with which a value may be associated, but so that the association may be changed as a result of an assignment. Our model will represent the association by considering a variable to be a location (or several locations for composite types), and defining the state of a computation to be a mapping from locations to values. The state keeps track of everything that goes on at run-time, encompassing all the dynamic objects that are allowed to change as the program runs.

Although some variables are bound to static objects (identifiers) according to the current dynamic_environment, many variables are anonymous (i.e. not bound), being accessible only indirectly, as by pointers. A pointer is just a particular kind of value.

The dynamic (run-time) effect of an assignment is to evaluate the expression (yielding a value) and to copy it into the location(s) of the variable yielded by the variable access (via the dynamic environment).

In our model, the state mapping is changed so that the location(s) now map onto the new value, and all trace of the previous mapping of that location(s) is lost. Thus the most characteristic feature of assignments is that they forget things. Example code fragment illustrating variables and their static and dynamic semantics:

{int fred;
causes fred $\rightarrow$ (integer, var) to be added to the static environment.
char *ptr;
causes ptr $\rightarrow$ (char*, var) to be added.
at run-time, obtain two variables each of one location (e.g. locf, locp), and include fred $\rightarrow$ locf and ptr $\rightarrow$ locp in the dynamic environment.
fred= 99;
causes the state to include locf $\rightarrow$ 99 instead of whatever locf mapped to before.
ptr= malloc(1);
find an unused variable consisting of a single location (e.g. locanon) and change the state to include locp $\rightarrow$ locanon
*ptr= 'c';
work out that the dynamic environment currently maps ptr to locp, which the state currently maps to locanon, so modify the state so that locanon $\rightarrow$ 'c'.
}
the static and dynamic environments can forget about fred and ptr (and a clever implementor will permit locf and locp to be re-used for something else).

Readings

Louden: chs. 5.1, 5.2, 5.4, 5.5
Bal & Grune: chs. 2.2.1, 2.3, 2.3.1
Aho, Sethi & Ullman: ch. 7.1

Next: Scope and Extent Up: CS2111: Design and Implementation Previous: Procedures and functions Contents

Pete Jinks
1999-09-30