next up previous contents
Next: Expressions: Code Generation Up: CS2111: Design and Implementation Previous: Parse Trees   Contents

Subsections


Language Design

Rather than discuss vague design principles, I want to look at concrete examples of good and bad design. We will mainly look at ANSI C, not because it is particularly good or bad, but because it should be most familiar to you. Also, we will mainly look at representation (syntax), rather than meaning (semantics).


Error detection and correction

Not all strings are valid representations. The distance between 2 strings is the number of places they are different.

If valid strings are never neighbours then a single error is always detected,
if they are always at least 2 differences apart then a single error can always be corrected, etc.
This can require extra effort, so some disagree.
Sometimes user can make choices that enhance error detection and correction.
Sometimes possible to language designer: change concrete grammar, keeping abstract grammar and meaning the same.

C is deformed because its parents were frightened by keywords

C has a consistent philosophy of minimising the use of keywords. This is probably because C was based on BCPL, a language designed when compilers had to be small to fit in memory - a BCPL compiler on a PDP-7 used less than 4K 18-bit words. It replaces keywords by overloading other symbols such as brackets ( ) and braces { }, or simply overloading the user and implementor! In several places the absence of keywords makes reading and/or writing C programs harder and more error prone.


Control structures

$\S $7.3: dangling else - the best solution is simply to insist on a terminating keyword (e.g. end_if or fi). As well as being a good solution to this problem (no special rule for users to learn and compiler writers to implement), there are further benefits when we generalise this to all control constructs (as we should, to avoid yet another special rule): However, some C control constructs have keywords that other languages avoid:
return
for function results, may be better than Pascal's name:=, although the difference is rarely important. However, some people disapprove of multiple returns, or a return from the middle of a function.
break
most modern languages that have a switch statement (often known as a case statement) insist that the cases are separate and automatically terminate. C, along with many other old languages, allows cases to fall into each other, but at least provides break to take control to the end of the switch. (We will discuss other uses of break and the related continue in a later section.)

Declarations

Pascal and C are very different when it comes to declaring variables with complicated types - in particular, Pascal declarations and accesses are easily derived from an English description of what is required. (Note: Pascal does not have some of the facilities of C with respect to functions, and C does not have some of Pascal's data types.) However, both languages make declarations match the subsequent uses of the variables in expressions.

C:
access: postfix, left-associative [ ] ( ) . -> of higher precedence than prefix, right-associative *
declarations: must use (* ). instead of -> and element typename comes first
(like a cast - same precedence/associativity/fix as dereference)

Pascal:
access: postfix, left-associative ^ . [ ]
declarations: [...] becomes array [...] of, element typename comes last

e.g. an array of pointers to characters
in Pascal:


        s: array [0..9] of ^char;
        . . .
        s[i]^ := 'a';
in C:

        char *s[10];
        . . .
        *s[i] = 'a';        /* i.e. *(s[i]) */

e.g. a pointer to an array of characters
in Pascal:


        s: ^array [0..9] of char;
        . . .
        s^[i] := 'a';
in C:

        char (*s)[10];
        . . .
        (*s)[i] = 'a';
Why do new users find declarations in C harder to understand than in Pascal? This leads to a much bigger problem with C: the grammar of declarations is fundamentally confusing, both for LR(1) parsers like YACC, and for human beings e.g.:

  {int bar;}
  {long int;}
  {long int bar;}

  typedef int foo;
  {foo bar;}
  {long foo;}
  {long foo bar;}

CPP


 #define SP3() if(b){int x; av=f(&x); bv+=x;}
  . . .
 if (x==3) SP3(); else bork();
If we omit the ; after the use of SP3, the else will silently become associated with the if in the macro, but if we include it the else does not belong to any if. In Pascal, a block end that replaces a statement must still be followed by the usual ;, but C treats { }; as two separate statements. We need something like:

 #define STMT(stuff) do{stuff}while(0)
 #define SP3() STMT(if(b){int x; av=f(&x); bv+=x;})
Related problems arise in:

 #ifdef CIRCUIT
 #   define CLOSE(circno)   {close(circno);}
 #else
 #   define CLOSE(circno)
 #endif
  . . .
 if (expr) statement; else CLOSE(x)
 ++i;

Puns: similar representations with different meanings

(int) *x= 0 assigns 0 to the object pointed to by x
int *x= 0 assigns the NULL pointer to x
   
(int) x= y= 0 makes x and y both equal to 0
int x= y= 0 makes x and y both equal to 0, but only declares x
   
(x= 1, y= x+1) the , indicates sequencing
f (x= 1, y= x+1) if f is a function the , does not indicate sequencing
  if f is a macro, any sequence depends on its definition
<>
Uses of , are macro definitions and calls, function definitions and calls, operation, separator in initialisations, separator in declarations.

Arrays and functions sometimes behave like pointers (and vice-versa).

0 is an integer, is false, is a pointer, and is the string terminator

void is the result type of a function that is really a procedure, the type of a pointer that is semi-valid (?), and the type of a function that takes no parameters

int f(void) can only be called by f()
int f() can be called with any parameters!
<>

12 == 014 but '\14' == '\014'

'A' getchar() are both of type int, not of type char

MIN <= x <= MAX is a valid expression, but unlikely to mean what the user intended.

typedef enum {red, blue, green=0, yellow} colour; gives the same value to red and to green, and to blue and to yellow, and they are all of type int, so yellow - green == blue

Overloading symbols

x = 1 x == 1 are both expressions, so both can appear inside the condition of an if statement or a loop, and both can appear on their own as a statement.
similarly a <= b a <<= b

static, extern, void, *, &, <, ( )

p= N * sizeof * q; r= malloc (p);

sizeof (int) * x; sizeof ((int *) x); sizeof ((int) * x);

(b) + (c); (b) - (c); (b) * (c); (b) & (c); (b) (c);

When is white space significant?

Multiple name spaces: same name, different meanings


typedef int foo; struct foo bar; struct foo {int f;};
void fred (void) {foo: ;}

typedef int foo; struct foo {foo foo;};
void fred (int foo) {struct foo foo;}

typedef int foo; struct foo {int foo;};
void fred (void) {foo foo;}

struct foo {int foo;};
void foo (int foo) {struct foo bar;}

typedef struct list *list;
struct list {int field; list list;};
int length (list l)
{
  int i= 0;
  list: if (l) {
        i++; l= l->list;
        goto list;
        }
        else return i;
}

Synonyms: different representations, similar meanings

Orthogonality

All language features can be combined in every way the user wants. e.g.

Exercises

($\dag $ indicates harder problems)

Readings

man cdecl
Louden: ch. 3
Bal & Grune: 2.2.3, 2.4.6
next up previous contents
Next: Expressions: Code Generation Up: CS2111: Design and Implementation Previous: Parse Trees   Contents
Pete Jinks
1999-09-30