next up previous contents
Next: Abstract Data-Types, Modules, Classes Up: CS2111: Design and Implementation Previous: Composite Types   Contents

Subsections


Vectors and Arrays - Language design and implementation

Vector Declarations and element access

The possible values for each index of each array are defined by the upper and lower bounds, which in many languages can be any integer or even any scalar (other than reals).

The same information may be expressed by the number of elements if the language fixes the lower bound.

Implementation

A sequence of e.g. words or bytes, depending on the element type. May need extra information, depending on operations (see below).
We will model an array variable as a set of as many locations as the array has elements, each mapped to a value according to the state. Then the variable access array[index] just selects the correct location from the array variable and proceeds as for a simple variable.

To simplify finding the location(s) for a particular element of an array, we place the locations used for the array at a regular pattern of addresses, so array[index] is at address
origin + index * stride,
where the origin is the starting address for the array and the stride is the constant offset between the addresses of the locations of successive elements of the array. (The origin is the address of array[0]; if 0 is not a valid index then the origin is not an address of an element of the array.)

For simple languages, a compiler can recalculate the stride from the element type whenever necessary, so the only information needed to access an array is its origin.

Bound checking

In many languages we should check that each index falls in the permitted range i.e.
lower bound $\le$ index $\le$ upper bound

Implementation

The compiler needs to know what the bounds are e.g. from the declaration. If the lower bound is fixed, and especially if it is zero, this is even simpler. If the array is dynamic (see below) this is harder.

Multi-dimensional arrays

Most languages allow multi-dimensional arrays, either by having multiple indices or by permitting the element type to itself be an array, although some have a maximum allowed number of dimensions.
Many modern languages treat an array as a vector whose elements may be of (almost) any type, including arrays. Thus, a 2-dimensional array is just a vector of vectors. Some languages make this clear in declarations and/or accesses, e.g. using a[i][j], some languages hide this e.g. using a[i,j], and some allow both forms, although they may not treat them in exactly the same way.

Implementation

Element access is similar to that for vectors, except that after each index (except the last) is processed, the calculated address is just used as the origin for the next index calculation; if we have processed all the indices then this is the correct address.

The stride is calculated from the array declaration. Many languages leave this to the implementor; the simplest method is to use the size of a vector element, calculated from its type (which may itself be a vector). Some languages insist that the strides are calculated in a specific way (usually similar to this simple method) so that:
the user knows exactly how memory is used, or
the bounds of one dimension can be dynamically varied at run-time, so this dimension (either the first or the last for a multi-dimensional array) must have the largest stride.

Algorithms for array accesses:
  access bchk stride(s)
1-D O+I*S $L\le I\le U$ S=sizeof(element)
multi-D O+$I_1$*$S_1$ $L_1\le I_1\le U_1$ $S_1$ = "
  +$I_2$*$S_2$ $L_2\le I_2\le U_2$ $S_2$= " *($U_1-L_1$)
  . . . . . . . . .
  +$I_n$*$S_n$ $L_n\le I_n\le U_n$ $S_n$= " *($U_1-L_1$)* . . . *( $U_{n-1}-L_{n-1}$)
<>
(note: $L_n$ and $U_n$ are not required to calculate S)

Static and Dynamic arrays

Some languages require the size of arrays to be known at compile-time (static arrays), usually by restricting bounds to literals, or to constants, or to constant expressions.
Some languages allow the size of arrays to be computed at run-time (dynamic arrays), so bounds can be any expressions - this can cause difficulties for the implementor and so is controversial - some languages only permit this for one dimension (either the first or the last) of a multi-dimensional array, and/or only allow it for anonymous variables.

Some languages can query a vector (usually a parameter) to find its upper and/or lower bounds, or iterate over all elements of a vector.

Implementation

The less that the compiler knows at compile-time, the more it has to calculate and pass around at run-time. For example, if a language has both dynamic arrays and bound checking, then the bounds (and strides) of arrays have to be decided at run-time, when the array is created, and passed on if the array is used as a parameter.
So, in a simple language a vector parameter passed by reference is just the origin, but in a more complex language we need more information. This can be either auxiliary vectors of pointers (Iliffe vectors - see exercises) or of dimensional information (dope vectors). The latter is essentially a list of triples of (lower bound, upper bound, stride), with one triple for each dimension. (This is assuming a language where the lower bound is not fixed - if it was we would only need a pair for each dimension.)

Implementation of operations on arrays:
  fixed variable  
static 1-D S L&U O
  multi-D S L&U O
dynamic 1-D S O L&U
  multi-D - O S L&U
slice   - O S L&U
<>
(note: if no bound-checking is required, simply ignore L&U.)

Static arrays passed as parameters have to be treated like dynamic arrays unless the bounds are fixed at compile time, e.g. by type-checking.

Some languages require implementations that calculate and keep this information in a dope vector at run-time, as it allows the most generality for e.g. array slicing or dynamic arrays.

Other languages, that constrain array declarations or operations, can have implementations that fix some or all of these at compile time, and so allow faster code.

Operations on vectors, including slicing

In Pascal, arrays are defined to be built so that:
a: array [1..4, 1..7] of integer;
means exactly the same as:
a: array [1..4] of array [1..7] of integer;
and thus either a[i, j] or a[i][j] can be used interchangeably. Therefore only the first dimension can be sliced, and only to select one element (row or column): a[i].

Implementation

If the language allows slicing, then any array may be a slice from a higher-dimension array, so every array has to be treated as if it is dynamic - the origin, bounds and strides can only be calculated once the slice has happened, at run-time.

Exercises

Readings

Louden: chs. 5.1, 5.2, 5.4, 5.5
Bal & Grune: chs. 2.2.1, 2.3, 2.3.1
Aho, Sethi & Ullman: ch. 7.1
next up previous contents
Next: Abstract Data-Types, Modules, Classes Up: CS2111: Design and Implementation Previous: Composite Types   Contents
Pete Jinks
1999-09-30