Next: LEX Up: CS2111: Design and Implementation Previous: Introduction Contents

Subsections

Expressions: Operators

$\S$ 1.5.1 introduced fix (infix, prefix and postfix). In any one language, we may just find one fix in use (e.g. prefix or postfix), but normally we find a mixture of infix, prefix, and postfix. In this section, we will look at some other simple but important features of expressions.

Number of Operands (per operator)

Any particular operator will usually only take a fixed number of operands, typically in the range 1 to 3, referred to as:

	1 operand	2 operands	3 operands	. . .
adicity	monadic	diadic	triadic	. . .
arity	unary	binary	ternary	. . .

As functions and procedures are only named operators that take a list of operands (parameters), there obviously should be no limit to the (fixed) number of operands any one operator may take. However, to simplify building and manipulating the parse trees, we often treat the brackets and commas as the operators instead. Multi-dimensional array accesses are similar, although it is clearer that the brackets really are operators. Operators that take 3 or more operands are usually prefix or postfix but may not fit this simple classification. Some languages where all operations are bracketed prefix (or postfix) have variable number of operands.

Overloading

Languages often overload operators i.e. use the same symbol for different operations, both because of the limited number of symbols available on an ASCII keyboard, and because mathematical notation also often overloads operators.

For example, both mathematics and many languages use - both for subtraction and for negation. Some languages make an effort to distinguish these, e.g. using ~ for negation. Even so, almost all languages have some overloading; e.g. ( and ) are used both for bracketed expressions and for lists (e.g. parameters) or tuples (or even array accesses).

These different uses can normally only be distinguished by the context in the representation - e.g. whether the - is used as a monadic or diadic operator, or whether the ( is preceded by an identifier or by another operator. Other kinds of overloading can happen because, as in mathematics, the same symbol is used for e.g. addition of integers, reals and matrices - again, these different uses can only be distinguished by context, but this time in the meaning e.g. the types of identifiers.

Precedence

The commonest, but not ubiquitous, rule is that, for example, multiplication and division should be performed before addition and subtraction. Typically we collect together similar operators, so that e.g. multiplication, division and remaindering usually have the same precedence.

The greatest variation is found in the precedence of logical operations such as and and or. Some languages give them the same precedence, whereas most give and higher precedence than or. Even then, some languages give multiplication and and, and addition and or the same precedence, whereas most give arithmetic operations higher precedence than logical operations.

Overloading also can cause precedence problems. Some languages give monadic - (negation) higher precedence than diadic - (subtraction) whereas other languages give them the same precedence.

Associativity

Probably because Indo-European languages are read from left to right, most infix mathematical operators are left-associative; that is, a series of operators of the same precedence is evaluated from left to right.

An example of an exception to the general rule that infix operators are left-associative is exponentiation (raising to a power, not e), which is found in some languages.

Prefix operators are usually right-associative, and postfix operators left-associative.

Some languages also have non-associative operators i.e. a series of these operators is not permitted unless they are explicitly bracketed. The commonest example is with relational operators.

The concept of associativity is not relevant in languages where operators and their operands are already bracketed together, thus making the sequence explicit.

There are even languages where all operators are non-associative, so complex expressions must be explicitly bracketed - this simplifies translation, but is initially irritating to the user.

Precedence and associativity can actually be expressed by the rules describing representations, but this is not generally the case when we are defining meaning.

Subexpression evaluation order

Associativity describes the order of evaluation of a series of operators of the same precedence, but does not describe the order in which the two sub-expressions or operands of a particular operator should be evaluated.

There are only two circumstances where this matters:
the first occurs with side-effects - if an item in an expression can change the value of another item in the expression, either directly or via a function that can modify global variables or its parameters.
The second occurs in an expression of the form: if the access is illegal or the accessed value is the one wanted ...

Some languages specify the order in which sub-expressions are evaluated (usually left-to-right),
some minimise the problem by not allowing side-effects, and
some just expect the user to write code that will not be affected by the order of evaluation.
Languages that do not specify evaluation order are usually defined this way so that compilers can rewrite expressions to optimise the code they generate. Such languages often deal with any resulting problems by defining special operators which use lazy evaluation.

Lazy and eager evaluation

Lazy evaluation is not performing part of a computation unless or until we really need to. In this context, it means only evaluating that part of a (boolean) expression that is enough to decide the value of the whole.

To be useful, lazy operators must have a specific order of evaluation of their sub-expressions (usually left-to-right).

Not all languages have conditional expressions, but most of those that do use lazy evaluation: a ? b : c will first evaluate a, and then either evaluate b or c but not both. Very few languages use eager evaluation, in which both b and c are evaluated and then one value or the other discarded.

A language with lazy conditional expressions does not actually need lazy boolean (e.g. and or) operators as well, as their effect can be simulated, but it is convenient to have them also available.

A system based on lazy evaluation is MAKE, which, given one or more targets, evaluates only those intermediates that are necessary to produce the targets, and even then only those that depend on sources that have changed, regardless of how many different intermediates are described within its MAKEfile.

Something similar goes on inside spread-sheets; when the value of any cell is changed, a good spread-sheet will only recalculate the values of those cells that are directly or indirectly affected.

Assignment

Some languages do not allow assignment to a variable, but only binding of a value to an identifier. Some languages only allow assignment in assignment statements. Some languages treat assignment just as another operator, that can happen as many times as you like inside any expression, and such assignment is usually right associative. Some languages even allow multiple assignment.

C

C operators are mainly diadic, infix, left associative, with the order of evaluation of operands undefined. The exceptions are noted in the following lists of operators, ordered from highest (at top) to lowest precedence (note - the second and third lines have the same precedence):


  ( )                                             [( ) = bracketed expression]
{ ( ) [ ] -> .                                    [( ) = function call]
{ ++ - -                                          [monadic, postfix, L-assoc]
  ! ~ + - * & (type) sizeof ++ - -                [monadic, prefix, R-assoc]
  * / %
  + -
  << >>
  < <= > >=
  == !=
  &
  ^
  |
  &&                                              [lazy L->R]
  ||                                              [lazy L->R]
  ?:                                              [triadic, lazy L->R, R-assoc]
  = += -= *= /= %= &= ^= |= <<= >>=               [R-assoc]
  ,                                               [evaluate L->R]

SML

SML has facilities for converting any diadic function into a left-, right-, or non-associative operator with any precedence, and for converting any operator into a function. It also allows identifiers made up from:
! % & $ # + - / : < = > ? @ \ ~ ' ^ | *

Because of this, it is hard to be as specific about SML as about C, but here is my attempt. Again, operators seem to be mainly diadic, infix, left associative, with the order of evaluation of operands undefined - exceptions noted below:


  ( )
  / div mod *
  + - ^
  :: @                                             [:: is R-assoc]
  = <> < > <= >=
  := o                                             [:= is R-assoc]
  andalso                                          [lazy L->R]
  orelse                                           [lazy L->R]
  if-then-else                                     [triadic, lazy L->R, R-assoc]

Language design and implementation

We have looked at most of the possibilities for expressions and operators, although we will be coming back to look at operands and types in later sections. There are some points worth making about the interaction of design and usability:

most programmers behave as if expression design is a "religious" issue, rather than one amenable to rational debate: the best design is the one they were brought up with or are most familiar with, because it is easiest to read and therefore debug.
This implies that:
expression design should match naive users' expectations
good programmers use the subset of expression design common to most programming languages, rather than exploit the features of a particular design
languages that permit expression design to be changed are not successful (unless these features are rarely used!)
however, there are sometimes advantages to a non-intuitive expression design:
- a different programming paradigm may need a different design
- a simple/cheap implementation may force a particular design

To implement expressions, we must input them, check that they are sensible, and either save them in some internal form for later use (e.g. parse trees) or output a translation (e.g. assembly code). There are many different algorithms, but rather than examine them directly (c.f. CS304), we will look at some simple, ubiquitous tools: LEX and YACC.

In the next few sections you will learn how to use them via expression processing examples. We can then comment on the interaction of expression design and implementation, although there is one point worth making now:

operators like += and ++ are for succinctness and clarity, not implementation efficiency

Exercises

List the operators of any other programming language that you know and write down their number of operands, precedence, associativity etc. If you do not know another language well enough, go into the library and pick any book describing a language. How easy is it to find this information in the book?
Does it make sense to have two operators of the same precedence but different associativity? What happens when you use them in the same expression?

Readings

Louden: ch. 5.7
Bal & Grune: chs. 2.2.1, 2.3.1-2.3.3, 2.6.1, 2.6.2

Next: LEX Up: CS2111: Design and Implementation Previous: Introduction Contents

Pete Jinks
1999-09-30