next up previous contents
Next: Expressions: Operators Up: CS2111: Design and Implementation Previous: CS2111: Design and Implementation   Contents



Syllabus etc.

In many circumstances, an effective way of communicating with a program (e.g. to issue commands, to input data, etc.) is to use a textual interface. Implementing such an interface often means designing a language and software that accepts and obeys this language.

This course unit examines programming languages, and in particular imperative languages, looking at the fundamental concepts they have in common, and outlining how to implement them. We also look at the variations in these concepts between languages, and see how this affects usability and ease of implementation. We will investigate these concepts using examples from various languages, particularly C and SML, rather than examining complete languages in turn.

The aim is that by the end of the course unit, you

Examples of Language design - declaring and using variables and strings

char a = 'A';
char* abc = "A B C";
printf ("%c=a %s=abc\n", a, abc);
Example C fragment

set a = A
set abc = "A B C"
echo $a=a $abc=abc
Example CSH fragment

a = A
abc = A B C
echo $a=a $(abc)=abc
Example MAKE fragment

Interpreters and Translators

commands $\rightarrow$
execute/obey input
input $\rightarrow$
$\rightarrow$ output
transform input to the output
commands $\rightarrow$
input $\rightarrow$
$\rightarrow$ output
Many programs do both
Commands and input may be separate or mixed together.

Programs often consist of a series of filters, perhaps followed by a final translator or executor.

Some example programs and their languages


GREP echoes lines from the input(s) that match a pattern.
Each input line is processed in turn: characters on each line are scanned for patterns, and if a pattern is completely recognised, the current line is output.
Metacharacters (characters with a special meaning) \ ^ $ . [ ] * + ? | ( )
any non-metacharacter recognise that character
\any metacharacter recognise that metacharacter
\n or \t as for C - newline or tab
^ beginning of line
$ end of line
. any one character except newline
[ ] any one character from inside the brackets
  indicate ranges by - e.g. [a-zA-Z]
  an actual ] must be the first character
  an actual - must be the first or last character
[^ ] any one character except . . .
anything* anything repeated none or more times
extra facilities in EGREP:  
anything+ anything repeated one or more times
anything? anything repeated none or one times
one | another either one or another
( ) recognise the contents
precedence: [ ] ( ), * + ?, concatenation, |


We will examine these two programs in a lot more detail later, as they are very useful for implementing computer languages. The major part of their languages are patterns and corresponding pieces of C code (actions). The major difference between LEX and YACC is the way the patterns are described - LEX is similar to EGREP, but YACC is very different.

Strategies for Describing and Implementing Computer Languages

A translator, whether human or machine, has to understand what it is translating. Because we can normally rely on the common sense and shared experiences of other human beings, natural languages can be ambiguous, fuzzily structured and have large vocabularies.

In contrast, not only do computer languages have limited, mainly user-defined vocabularies, but also we have to exactly specify the legal forms of each construct of the language and what it means, both alone and in combination with every other construct.

Representation and Meaning

The meaning of a + b is usually along the lines of:
"take the values associated with the identifiers a and b, combine those two values into one by addition, and have the resulting value ready for further use"

This typical meaning may be modified in many ways e.g.:

This meaning can be represented in several ways e.g.:

  a + b          [infix]
  a b +          [postfix/Reverse Polish Notation]
  + a b          [prefix]
  add a b
  add (a, b)
  (add, a, b)
  add a and b
  a plus b

Users want every sensible meaning to have a straightforward representation.
Implementors want every possible representation to have a reasonable meaning.

We normally refer to the representation as the syntax, and the meaning as the semantics, and there are various informal and formal notations and concepts we can use to help us define them for a particular language. I will avoid using formal notations for semantics, but we will be using various software tools (LEX and YACC) to help with syntax, each of which has its own notation.

Semantics is itself usually split into two parts:

static semantics (or semantics or context)
the meaning of a program that is apparent when it is read as a piece of text by a human or a compiler
dynamic semantics (or run-time behaviour or semantics)
the meaning of the program that is apparent as it is running
Thus, important features common to most computer languages are:
(syntax) described by grammars; normally implemented using parser-generators
(static semantics) bind meanings to strings in a scope; implemented using a dictionary.

Parse trees

A 2-dimensional representation. They make the relationships between operands, operators etc. explicit, and so are clearer than the linear notations above. However, we still need extra information about the meaning, such as the order in which operands are accessed. e.g. parse tree for a + b

...){\line(1,1){1.5}} % + -> a
\put(5,1.4){\line(-1,1){1.5}} % + -> b



Bal & Grune: chs. 1, 7.2.2     Louden: chs. 1, 2

Capon & Jinks: ch. 1 Aho, Sethi & Ullman: ch. 1
Rationale for ANSI C FAQ for comp.lang.c

next up previous contents
Next: Expressions: Operators Up: CS2111: Design and Implementation Previous: CS2111: Design and Implementation   Contents
Pete Jinks