A given value may have many instances (inside the computer, or on pieces of paper) e.g. 3, 3, 3 are different instances of the one value 3. Instance is often used slightly differently, when we say that each identifier of a type is a different instance of that type.
Every type must have values and operations which act on those values, and every value or operation must have a type.
(Although representations of values and operations can be overloaded, their meanings must be uniquely determinable.)
A type may be either primitive (simple, scalar) or composite (aggregate) i.e. made up from other types.
For each type, a programming language must provide operators, constructors for creating values, and (for composite types) selectors for breaking down values.
Most strongly typed languages are statically typed i.e. all the above is done
at compile-time. (e.g. Pascal, Ada, Algol68, SML.)
Some languages (e.g. Awk) have dynamic (i.e. run-time) type checking.
Some languages have no type checking - every value is just a bit pattern
that can be interpreted in different ways depending on which type it is
assumed to be at any moment.
Many languages are weakly typed - i.e. anywhere in between - so
some parts of the language are strongly typed but other parts are not
(e.g. undiscriminated unions, or a bit-pattern type), or the language is
strongly typed except where you explicitly ask for no checking (e.g. type
casts in C).
Strong typing is a good thing because it improves security by preventing many forms of accidental misuse of data. Empirically, it has been observed that strong typing is a very effective aid to achieving program reliability.
The more distinct types there are in a program, the better the chance of detecting errors at compile time. Ideally, even different usages of numeric types should be distinguished e.g. Ada:
type mass is new real; type length is new real;then assigning a value of type mass to a variable of type length is illegal.
As the types mass
and length
in the example above are
structurally equivalent, a language that distinguishes between them must use
name equivalence. However, this is not enough: Pascal uses name equivalence
but quite reasonably permits mixed arithmetic on integer subranges and full
integers - extra mechanisms are necessary to allow this but rule out adding
masses and lengths (e.g. the new
keyword in the example).
There is a related question to do with arrays; assuming we have the correct number of dimensions, if the length of an array is part of the type (which advocates of strong typing say should be the case) it is impossible to write a single function that can deal with arrays of any size. Therefore, general purpose sorting or string manipulation functions, or a library of matrix operations, are impracticable.
Pascal, which had this problem, partially overcame it by adding a variable-length array type which could only be used in parameter declarations.
C partially sidesteps this problem by not including the first array dimension as part of the type; however, to be able to guarantee reasonably efficient compiled code, all other dimensions of a multi-dimensional array must be fixed.
A common example is mixed arithmetic using integers and reals, where the
integers are automatically converted to reals. There are two points to note
here; the first is the timing of the conversion. Most languages do this at
the last moment, so e.g. integer + integer + real
would mean an integer addition, followed by a conversion, followed be a real
addition, but some languages might convert both the integers to reals before
adding them.
The second point is that there is no guarantee that real values are held at least as accurately as integer values, so we may well get different answers from those expected. This latter point is even more of a problem if we are not even sure when the conversions happen.
Another example is mixed arithmetic on integer subranges and full integers,
or low- and high-precision reals: this is normally useful, but can cause
problems if we want to keep e.g. mass
and length
separate.
Another problem is defining and remembering what happens when e.g. two
byte-sized integers are added; are they added as words, as would be
appropriate if we wanted to save the result back to a full integer, or are
they added as bytes, as might be appropriate if we wanted to save the result
to a byte (but perhaps we want to detect overflow and would prefer word
addition and a runtime check).
It happens when a value is automatically converted from a superset type to a subset type. For example, should we narrow the real value 3.7 to the integer value 3 or 4? Whichever happens, what if you wanted the other, or what if it was implementation dependent?
Because it is so dangerous, most modern languages do not permit automatic narrowing, but instead contain various explicit conversions e.g. from real to integer using truncate or round.
1 2 123
etc.
+ - * / < = >
etc.
-1.0 2.718281828 1.234e-12
etc.
+ - * / < = >
etc.
A...Z a...z 0...9
!"$%^&*()-+=`[]{};':
@,./<>?#~|
etc.
'A' 'B'
or "A" "B"
etc.
< = >
etc, conversions to and from integer,
succ pred
, etc.
true false
and or not eor == != implies
(<= =>
?) etc.
enum boolean { false, true }
enum escapes { bell = '\a', backspace = '\b', tab = '\t', newline = '\n', vtab = '\v', return = '\r' }Constructors:
false, bell
etc.
=
,
if ordered < = > succ pred
etc.
type natural = 0 .. maxint
e.g. type digit = '0'..'9'
e.g. C: float is a subrange of double, short of long
metre = new integer
, inches = new integer
can not be mixed
year = new integer
can not + - * /
etc. - does not
give a year
!
C is strongly typed except that:
most scalar types are really integers
casts, particularly of pointers, defeat most type checking
e.g. arrays and functions can become pointers and vice versa
functions declared with empty parameters are not type checked
functions declared with varargs cannot be fully type checked
unions are not discriminated
the first dimension of an array is not part of the type
but any other dimensions are (maybe)
i.e. it is not really strongly typed at all!