next up previous contents
Next: Composite Types Up: CS2111: Design and Implementation Previous: Scope and Extent   Contents

Subsections


Types

Values and types

A value is an object chosen from a defined set, which is its type e.g. -3, -2, -1, 0, 1, 2, 3 are all values of type integer.

A given value may have many instances (inside the computer, or on pieces of paper) e.g. 3, 3, 3 are different instances of the one value 3. Instance is often used slightly differently, when we say that each identifier of a type is a different instance of that type.

Every type must have values and operations which act on those values, and every value or operation must have a type.

(Although representations of values and operations can be overloaded, their meanings must be uniquely determinable.)

A type may be either primitive (simple, scalar) or composite (aggregate) i.e. made up from other types.

For each type, a programming language must provide operators, constructors for creating values, and (for composite types) selectors for breaking down values.


Strong typing

Strong typing is a mechanism that prevents square pegs from being put into round holes. In a strongly typed language, every value and every expression has a type that can be determined, and the type accepted in every context where an expression can occur in the language (e.g. in assignments, in parameters etc.) is determinable.

Most strongly typed languages are statically typed i.e. all the above is done at compile-time. (e.g. Pascal, Ada, Algol68, SML.)
Some languages (e.g. Awk) have dynamic (i.e. run-time) type checking.
Some languages have no type checking - every value is just a bit pattern that can be interpreted in different ways depending on which type it is assumed to be at any moment.
Many languages are weakly typed - i.e. anywhere in between - so some parts of the language are strongly typed but other parts are not (e.g. undiscriminated unions, or a bit-pattern type), or the language is strongly typed except where you explicitly ask for no checking (e.g. type casts in C).

Strong typing is a good thing because it improves security by preventing many forms of accidental misuse of data. Empirically, it has been observed that strong typing is a very effective aid to achieving program reliability.

The more distinct types there are in a program, the better the chance of detecting errors at compile time. Ideally, even different usages of numeric types should be distinguished e.g. Ada:


        type mass is new real;
        type length is new real;
then assigning a value of type mass to a variable of type length is illegal.

Type equivalence

When do two types match during type checking? There are two major kinds of matching: name equivalence (two values are of the same type only if they are explicitly declared to be of the same type) and structure equivalence (two values are of the same type only if their types have the same form). The former can be more demanding of the user than the latter, as every function parameter must be of a named type, but the latter is much harder to implement.

As the types mass and length in the example above are structurally equivalent, a language that distinguishes between them must use name equivalence. However, this is not enough: Pascal uses name equivalence but quite reasonably permits mixed arithmetic on integer subranges and full integers - extra mechanisms are necessary to allow this but rule out adding masses and lengths (e.g. the new keyword in the example).

There is a related question to do with arrays; assuming we have the correct number of dimensions, if the length of an array is part of the type (which advocates of strong typing say should be the case) it is impossible to write a single function that can deal with arrays of any size. Therefore, general purpose sorting or string manipulation functions, or a library of matrix operations, are impracticable.

Pascal, which had this problem, partially overcame it by adding a variable-length array type which could only be used in parameter declarations.

C partially sidesteps this problem by not including the first array dimension as part of the type; however, to be able to guarantee reasonably efficient compiled code, all other dimensions of a multi-dimensional array must be fixed.


Type Coercion

Coercion is the implicit, automatic conversion of the type of an expression to the type required by its context. Most programming languages that have type checking also have some coercions, such as integer to real in mixed expressions. (Dereferencing is a coercion associated with pointers $\S $16.) Coercions can be dangerous because they are not explicit, but depend on declarations that may be some way away in the program text, and it is easy for the user to make mistakes.

Widening

Widening is the least dangerous coercion, but still has some problems. It happens when a value is automatically converted from a subset type to a superset type.

A common example is mixed arithmetic using integers and reals, where the integers are automatically converted to reals. There are two points to note here; the first is the timing of the conversion. Most languages do this at the last moment, so e.g. integer + integer + real would mean an integer addition, followed by a conversion, followed be a real addition, but some languages might convert both the integers to reals before adding them.

The second point is that there is no guarantee that real values are held at least as accurately as integer values, so we may well get different answers from those expected. This latter point is even more of a problem if we are not even sure when the conversions happen.

Another example is mixed arithmetic on integer subranges and full integers, or low- and high-precision reals: this is normally useful, but can cause problems if we want to keep e.g. mass and length separate. Another problem is defining and remembering what happens when e.g. two byte-sized integers are added; are they added as words, as would be appropriate if we wanted to save the result back to a full integer, or are they added as bytes, as might be appropriate if we wanted to save the result to a byte (but perhaps we want to detect overflow and would prefer word addition and a runtime check).

Narrowing

Narrowing is much more dangerous than widening.

It happens when a value is automatically converted from a superset type to a subset type. For example, should we narrow the real value 3.7 to the integer value 3 or 4? Whichever happens, what if you wanted the other, or what if it was implementation dependent?

Because it is so dangerous, most modern languages do not permit automatic narrowing, but instead contain various explicit conversions e.g. from real to integer using truncate or round.

Primitive types

Various languages have different primitive types, or different views of similar primitive types. The commonest types are:

integer

a subset of the integer numbers e.g. -maxint to +maxint
Constructors: 1 2 123 etc.
Operations: + - * / < = > etc.
Implementation: machine word, so operations may overflow

real

a subset of the real numbers with a limited discrimination i.e. how close two machine reals can be and yet be different. Some languages have e.g. predefined constant = smallest real such that (1.0 + constant) != 1.0
Constructors: -1.0 2.718281828 1.234e-12 etc.
Operations: + - * / < = > etc.
Implementation: machine word, so operations may overflow, underflow, and lose accuracy

character

the set A...Z a...z 0...9 !"$%^&*()-+=`[]{};':@,./<>?#~| etc.
Constructors: 'A' 'B' or "A" "B" etc.
Operations: < = > etc, conversions to and from integer, succ pred, etc.
Implementation: machine byte - but beware non-romance character sets e.g. Unicode

boolean

{false, true}
Constructors: true false
Operations: and or not eor == != implies (<= => ?) etc.
Implementation: machine bit

enumerations

e.g. enum boolean { false, true }

enum escapes { bell = '\a', backspace = '\b', tab = '\t', newline = '\n',
               vtab = '\v', return = '\r' }
Constructors: false, bell etc.
Operations: if unordered only =, if ordered < = > succ pred etc.
Implementation: machine word

subtypes

Modifies some base type (e.g. integer, character, enumerations) specified by some constraint on range or size or accuracy:
e.g. type natural = 0 .. maxint e.g. type digit = '0'..'9' e.g. C: float is a subrange of double, short of long
(we can select or exclude or add values, or in other ways specify how the values are constructed from those of the base type)
used to save space, discourage programming errors, as array bounds, as discriminants of unions
Constructors: as for the base type
Operations: as for the base type
Implementation: as for the base type, but possibly a smaller word-length

new types

e.g. metre = new integer, inches = new integer can not be mixed
e.g. year = new integer can not + - * / etc. - does not give a year!

Language Independant Datatypes

OSI/IEC Draft International Standard 11404. Primitive types are
as above: integer, real, character, boolean, enumerations, subtypes, new types
also: state, ordinal, date_and_time, scaled, rational, complex, void
derived types: bit, modulo, time_interval etc.
generated (from component/base) types: pointer, procedure, choice

C

C is full of implementation-dependent widening and narrowing coercions, and has dereferencing coercions for array and function pointers. All these coercions can be forced by explicit casts or operations.

C is strongly typed except that:
most scalar types are really integers
casts, particularly of pointers, defeat most type checking
e.g. arrays and functions can become pointers and vice versa
functions declared with empty parameters are not type checked
functions declared with varargs cannot be fully type checked
unions are not discriminated
the first dimension of an array is not part of the type but any other dimensions are (maybe)
i.e. it is not really strongly typed at all!

SML

Genuinely strongly typed. I can not think of any coercions.

Language design and implementation

As noted above, strong typing is a very effective aid to programmers, although macho programmers seem to find it demeaning. Compile-time checking is generally more useful than run-time checking, as with the latter, any run may exercise a piece of code that has never been used before and contains a type-clash.

Readings

Louden: chs. 6.1, 6.2, 6.4-6.7
Bal & Grune: chs. 2.2.1-2.2.6
next up previous contents
Next: Composite Types Up: CS2111: Design and Implementation Previous: Scope and Extent   Contents
Pete Jinks
1999-09-30