Next: Expressions: Operators Up: CS2111: Design and Implementation Previous: CS2111: Design and Implementation Contents

Subsections

Introduction

Syllabus etc.

In many circumstances, an effective way of communicating with a program (e.g. to issue commands, to input data, etc.) is to use a textual interface. Implementing such an interface often means designing a language and software that accepts and obeys this language.

This course unit examines programming languages, and in particular imperative languages, looking at the fundamental concepts they have in common, and outlining how to implement them. We also look at the variations in these concepts between languages, and see how this affects usability and ease of implementation. We will investigate these concepts using examples from various languages, particularly C and SML, rather than examining complete languages in turn.

The aim is that by the end of the course unit, you

can design and implement textual interfaces for your programs
have a head start on understanding new programming languages

Examples of Language design - declaring and using variables and strings


char a = 'A';
char* abc = "A B C";
printf ("%c=a %s=abc\n", a, abc);

Example C fragment


set a = A
set abc = "A B C"
echo $a=a $abc=abc

Example CSH fragment


a = A
abc = A B C
echo $a=a $(abc)=abc

Example MAKE fragment

Interpreters and Translators

commands $\rightarrow$

check(?)

obey

Interpreters

execute/obey input

input $\rightarrow$

check(?)

change

$\rightarrow$ output

Translators

transform input to the output

commands $\rightarrow$

input $\rightarrow$

Interpreter

Translator

$\rightarrow$ output

Many programs do both

Commands and input may be separate or mixed together.

Programs often consist of a series of filters, perhaps followed by a final translator or executor.

Some example programs and their languages

GREP and EGREP

GREP echoes lines from the input(s) that match a pattern.
Each input line is processed in turn: characters on each line are scanned for patterns, and if a pattern is completely recognised, the current line is output.
Metacharacters (characters with a special meaning) \ ^ $ . [ ] * + ? | ( )

any non-metacharacter	recognise that character
`\`any metacharacter	recognise that metacharacter
`\n` or `\t`	as for C - newline or tab
`^`	beginning of line
`$`	end of line
`.`	any one character except newline
`[ ]`	any one character from inside the brackets
	indicate ranges by `-` e.g. `[a-zA-Z]`
	an actual `]` must be the first character
	an actual `-` must be the first or last character
`[^ ]`	any one character except . . .
anything*	anything repeated none or more times
extra facilities in EGREP:
anything`+`	anything repeated one or more times
anything`?`	anything repeated none or one times
one `\|` another	either one or another
`( )`	recognise the contents

<>
precedence: [ ] ( ), * + ?, concatenation, |

LEX and YACC

We will examine these two programs in a lot more detail later, as they are very useful for implementing computer languages. The major part of their languages are patterns and corresponding pieces of C code (actions).

unlike GREP, they do not need to process the input line-by-line,
we normally expect their patterns to match the whole of the input,
as each pattern is completely recognised in the input, the corresponding action is obeyed.

The major difference between LEX and YACC is the way the patterns are described - LEX is similar to EGREP, but YACC is very different.

Strategies for Describing and Implementing Computer Languages

A translator, whether human or machine, has to understand what it is translating. Because we can normally rely on the common sense and shared experiences of other human beings, natural languages can be ambiguous, fuzzily structured and have large vocabularies.

In contrast, not only do computer languages have limited, mainly user-defined vocabularies, but also we have to exactly specify the legal forms of each construct of the language and what it means, both alone and in combination with every other construct.

Representation and Meaning

The meaning of a + b is usually along the lines of:
"take the values associated with the identifiers a and b, combine those two values into one by addition, and have the resulting value ready for further use"

This typical meaning may be modified in many ways e.g.:

"first take the value... a, then take the value... b, ..."
"...convert the values so they are compatible..."
"...combine the values by set-union/ string-concatenation/ matrix-addition..."

This meaning can be represented in several ways e.g.:


  a + b          [infix]
  a b +          [postfix/Reverse Polish Notation]
  + a b          [prefix]
  add a b
  add (a, b)
  (add, a, b)
  add a and b
  a plus b

Users want every sensible meaning to have a straightforward representation.
Implementors want every possible representation to have a reasonable meaning.

We normally refer to the representation as the syntax, and the meaning as the semantics, and there are various informal and formal notations and concepts we can use to help us define them for a particular language. I will avoid using formal notations for semantics, but we will be using various software tools (LEX and YACC) to help with syntax, each of which has its own notation.

Semantics is itself usually split into two parts:

static semantics (or semantics or context): the meaning of a program that is apparent when it is read as a piece of text by a human or a compiler
dynamic semantics (or run-time behaviour or semantics): the meaning of the program that is apparent as it is running

Thus, important features common to most computer languages are:

structure: (syntax) described by grammars; normally implemented using parser-generators
identifiers: (static semantics) bind meanings to strings in a scope; implemented using a dictionary.

Parse trees

A 2-dimensional representation. They make the relationships between operands, operators etc. explicit, and so are clearer than the linear notations above. However, we still need extra information about the meaning, such as the order in which operands are accessed. e.g. parse tree for a + b

$\begin{picture}(5,4) \put(3,3){\circle{1.0}}\put(2.8,2.8){+} \put(1,1){\circle{1... ...){\line(1,1){1.5}} % + -> a \put(5,1.4){\line(-1,1){1.5}} % + -> b \end{picture}$

Exercises

Any metacharacters of GREP or EGREP that are also metacharacters of the shell you are using have to be quoted. Which characters are these?
Choose some complicated expressions and for each
1. fully bracket it
2. write it using the other representations in $\S$ 1.5.1
3. draw the corresponding parse tree
Try expressions involving functions calls and data structure accesses - see if you can find different ways of drawing the parse trees for these by making different assumptions about which parts of the expression are the operators.
Use the MAN command to find out more details about MAKE, GREP, EGREP, and whichever shell you normally use (probably BASH) e.g. type man make etc.
Use GREP or EGREP on some text files e.g. to list all the Subjects of all your email.
Write a shell command file to compile and run any program you wrote e.g. for a first semester lab.
Convert your shell command file into a MAKEfile, that can compile and run your program as two separate steps. Add a third entry to the MAKEfile to keep the output from the run and DIFF it with a previous output, so you can immediately see the consequences of any change to your program.

Readings


Bal & Grune: chs. 1, 7.2.2     Louden: chs. 1, 2


Capon & Jinks: ch. 1 		Aho, Sethi & Ullman: ch. 1


Rationale for ANSI C 		FAQ for comp.lang.c

Next: Expressions: Operators Up: CS2111: Design and Implementation Previous: CS2111: Design and Implementation Contents

Pete Jinks
1999-09-30