CS1011 lecture(s) : Computer Languages

Reading: Brookshear ch. 5 or Greenfield ch. 6, 8.3-4

Why not use Natural Languages?

Computer languages must be unambiguous, so they have relatively few, precisely defined rules for composition of programs, and strictly controlled vocabularies in which unknown words must be defined before use.

Natural languages are ambiguous, fuzzily structured and have large (and changing) vocabularies.

Why is there more than one computer language?

1950s: first HLLs
1969: ~120 HLLs, about 15 in widespread use
1977: 81 HLLs in active (non-trivial) use
Today: 2 to 4 thousand HLLs etc.

Computer Science is constantly changing, so there is evolution of:
* concepts, &
* notations for concepts.

Sapir-Whorf Hypothesis

Weak form: patterns of thought affect language
Strong form: language affects patterns of thought

Hard to tell if evidence supports the strong or the weak (or either) form, but it "feels" plausible. It is difficult to think about, or discuss, or use, a concept that you can't pin down.

The Sapir-Whorf hypothesis indicates that different computer languages will enable us, or maybe even force us, to think about different aspects of the problem we are trying to solve.

Computer language generations

1st Generation Language (GL): binary machine code

2nd GL: assembler (Low Level Language LLL)
direct memory access & allocation;
simple machine-like instructions

3rd GL: High Level Language (HLL) (1952)
memory access & allocation via operators;
expressions & explicit flow of control

4th GL: application (data processing/database) languages
embedded in a development environment

5th GL: Declarative or Very High Level Language (VHLL)
fully hidden memory access, automatic allocation;
fully abstract machine
Declarative languages express what must be achieved, let the system work out how to achieve it
Operational languages express how something is achieved, let the reader work out what is being achieved

Why not just use 1GL/2GL?


* easy to write: useful concepts & facilities, relevant to application
* easy to read: computer, your future self, others
for reuse, maintenance, enhancement etc.
* portability: other compiler/toolset suppliers, users, computers
-> standards
* error detection & reporting:
during translation: best, but program verbose, restricted?
during execution: exceptions, debugger etc.

Run-time Efficiency:

depends on the implementation much more than the language

Power & Safety:

language evolution:
powerful & dangerous
restricted & safe
powerful & safe?

Case study of selecting HLLs: 1st year teaching

There is currently no one best language in all situations - we must pick the most suitable language(s) for each particular problem or situation. What factors might we need to take into account?

Workhorse HLL (for general practical programming)

Essential features:
* structured
* available
* approachable
* robust
* local knowledge
* environment
* bindings to other systems
* books

Desirable features:
* advanced features & concepts e.g. modules, abstract data types, object-oriented
* real world
* cleanliness
* support
* available widely & cheaply
* standardised

There is no one language that satisfies all these criteria, but our eventual choice, idC, satisfies most of the more important ones.

Truth & Beauty HLL

* illustrate very different way of thinking about programming
* local knowledge
* availability
* robustness
* cleanliness
* books

Again, no one language perfectly satisfied these criteria, but SML was thought to be the best choice.

Saying the same thing in different ways - Syntactic differences

e.g. expressions
* infix: SML, C: a = b + c
* prefix: LISP: (set a, (add b, c))
* postfix: FORTH: b c + a =
* "english": COBOL: ADD b TO c GIVING a
* vector/matrix etc.
* distributed: occam:
to_a ! b + c
to_a ? a

Saying different things - Semantic differences

Syntactic differences are important, but nowhere near as important as semantic differences.

Language paradigms (models)

Paradigm originally meant "sample" or "example", but now has several meanings. In Computer Science, it is taken to mean "a coherent set of methods that have been found to be (more or less) effective in handling a given type of problem".

Different kinds of languages emphasise different things about the problem, and so are better at describing different aspects of the solution, or even different kinds of problems and solutions.

What should be the main concept(s) embodied in a programming language?

State n. condition; circumstances at any time; a phase or stage; ...
= where we are + the (partial) result of getting there
In a computation, represented by e.g. values of registers and memory (variables etc.)

1GL, 2GL and the first 3GL, were all about representing state, the actions that modify the state and the sequence of the actions - the Imperative paradigm.

However, although this is intuitive for simple problems, it became clear that this did not scale well - i.e. it becomes disproportionately hard to use as problem size increases. To determine whether a program will work correctly, we must examine e.g. all possible combinations of actions on all of the state.

We need to control which actions can be performed on which parts of the state -> type checking.

We can only think about a limited number of different things at once, so limit what we need to consider - we want to control where parts of the state can be modified, and where actions can be used -> scope.

Scope and type checking became an integral part of all kinds of programming languages, but also gave rise to a new programming paradigm: Object Oriented

This is similar to Imperative but with maximum use of types & scopes - keep state in objects, each type of object having its own set of actions. Furthermore, the state in an object can only be accessed or modified via the associated actions.

As well as controlling state, other programming paradigms were invented that reduced the need to use state at all:

Functional paradigm - about transformations of values (notation usually makes it easy to describe & examine values)

Logic paradigm - about knowledge & understanding

Another difficulty with Imperative programming, associated with the concept of state, was the concept of sequence - there are many circumstances where the exact order of some actions does not matter, as long as they are all done before we progress to the next step.

Parallel paradigm - if actions don't interact, work on them in any order (non-determinism), or even simultaneously (multi-processing).

Also combinations and extensions of these paradigms, and other special purpose language paradigms.

The paradigms in more detail

(The program fragments are not intended to correspond precisely to any one language, although they are based on real languages.)

Imperative (Procedural) Paradigm

e.g. Assembly languages, Fortran, Cobol, Basic, C, Pascal (MU0, CS1031, CS1062, CS1511, CS1522)

e.g. making tea:
1) put water in kettle
2) boil kettle
3) put tea-leaves in teapot
4) pour boiling water from kettle into teapot

declare kettle, teapot
kettle= water;
boil (kettle);
teapot= tea_leaves;
teapot= teapot + kettle;
[water, tea_leaves etc. are fixed values] PRO:
* concept of a sequence of steps is easy to understand
* computational theory: Turing machines

* easily transformed into computer instructions

* too easy to bog down in detail
* must specify sequence even when we don't care

* different parts of the program interact via state, state modified by side effects (e.g. procedure calls), so hard to understand/ check/ modify what is happening

Functional (Applicative) Paradigm

Based on evaluation of expressions built out of function calls (applications). e.g. LISP, SML (CS1041, CS2511)

let boiling_water = boil (kettle, [water])
in (teapot, [tea_leaves, boiling_water])
[kettle, teapot, water, tea_leaves etc. are all fixed values]

* referential transparency - functions are like mathematical functions i.e. no side effects -> easy to understand, check, modify
e.g. (teapot, [tea_leaves, boil (kettle, [water])])
* prototyping
* functions are first class objects
i.e. we can create & modify them in a program
* some unimportant details disappear
* computational theory: Church's lambda calculus

* I/O hard
* some data structures hard to represent "naturally" particularly if they are to be modified
* naive implementations very slow, but good implementations nearly as fast as imperative

Logic Paradigm

Horn Logic: a certain condition (goal) is true if all the conditions in a list (subgoals) are true. If there are no subgoals, the goal is a fact. There can be alternative lists of subgoals for the same goal.

Specify a problem by describing its facts and properties. Solve a problem by giving the system a goal to prove.
e.g. Prolog (CS1412, CS2432)

* tea is made in a container holding tea leaves and boiling water.
* a container can be made to hold an item if the item is in a second container and we move it.
* a container holds boiling water if it holds water and we can heat the container to boiling point.

We can add some extra "common-sense" information to the system, by telling it that a container we make tea in should be water proof and heat resistant, and that tea-leaves come from a tea-caddy and water from a tap.

[names starting with uppercase letters are unknowns, those starting with lowercase letters are fixed.]

if we ask: make(teapot, tea) or: make (X, tea).
the system answers: yes or: X = teapot

We can even get the system to tell us how to make tea:
move(Item,Con,Con2):- write('move '), write(Item),
write(' from '), write(Con2), write(' to '), write(Con), nl.
canboil(kettle):- write('boil kettle'), nl.
and it will output:
move tea_leaves from tea_caddy to teapot
move water from tap to kettle
boil kettle
move boiling_water from kettle to teapot
[getting the system to answer correctly is generally more difficult than this.]

* prototyping
* AI
* Theory: predicate logic etc. CON:
* slow: it searches through all possible combinations to find solutions. naive implementations can be very slow.

Object-Oriented (OO) Paradigm

Encapsulate state in objects, which can only be accessed via a defined set of operations.
e.g. Simula, C++, Smalltalk-80, Eiffel, Object Pascal

class containers . . .
put_in = . . . take_out = . . . contents = . . .
class water_proof subclass of containers
pour_to = . . .
class boilable subclass of water_proof
boil = . . .
boilable kettle;
water_proof teapot;

kettle.put_in (water);
boil.kettle ( );
teapot.put_in (tea_leaves);
kettle.pour_to (teapot);
[water, tea_leaves etc. are fixed values]

* When we create a new class of object, we can reuse previous work by inheriting some or all of the definitions of similar objects, leading to hierarchies of objects. Some languages allow multiple inheritance, where a new object can reuse the definitions of several other objects.

* inheritance leads to polymorphism, which in turn simplifies maintenance & project control
* as user requirements change and system evolves, entities in the system less likely to change than the operations on them, so less likely to have to make fundamental changes to existing code
* easy to modify implementation of objects - prototyping
* a design methodology as well as a language model, so can capture more of the programming process (CS1052, CS1532, CS2032)
* classes -> typing system very powerful & discriminatory

* In most paradigms, we can resolve types at compile time. In OO languages, the flexibility gained from inheritance and polymorphism means that we can only completely determine the type of an object at run time (dynamic binding) -> slower.

Parallel & Distributed Paradigms

Contain explicit constructs for running different pieces of a program concurrently:
parallelism: interrupt, coroutine, fork & join, PAR [granularity]
communication: state, message (sync. or async.)
synchronisation: state (poll/busy wait, lock, semaphore, monitor), message (guard, rendezvous, RPC)

e.g. Simula, Algol 68, Concurrent Pascal, CSP, Ada, occam, Concurrent C, Concurrent Smalltalk, Multilisp, Concurrent Prolog

CHAN OF ANY to_pot, to_kettle, kettle_to_pot, to_cup:
declare water:
SEQ -- kettle
to_kettle ? water
boil ( )
kettle_to_pot ! boiling_water
declare tea_leaves, boiling_water:
SEQ -- teapot
to_pot ? tea_leaves
kettle_to_pot ? boiling_water
to_cup ! tea

[a complete program would include the other ends of to_pot, to_kettle &
to_cup; names not declared above represent fixed values e.g. tea]

* some problems are easier to implement using parallel constructs, because they exhibit inherent parallelism - the resulting programs may still be run on a single processor (pseudo-parallel)

* some problems can be solved faster by implementing them on a parallel machine and dividing the work over several processors

* some applications need to run on multiple processors or workstations connected by a network - distributed systems

* parallel & distributed programs are fundamentally more difficult to write & check e.g. deadlock, load-balancing
* communications & synchronisation overheads may negate much of the advantages of having multiple processors e.g. waiting

Most parallel languages have imperative features, but other paradigms can have parallel variants as well. In fact, the strict sequencing of imperative languages clashes badly with the relaxation of parallelism, and other paradigms can incorporate parallelism more simply. They may not even require many changes to the language, instead just altering the method of execution.

Other Paradigms

Special-purpose languages are intended for use in a single class of applications. Often as the interface to a particular piece of software - Command languages (HCI). Often imperative, although other paradigms exist e.g. WIMP excellent for simple actions. (CS1311)
operating system (CLI, shell) e.g. ksh
spreadsheet e.g. "A1 = sum(B1:E1)"
data base e.g. SQL
vector/matrix e.g. APL
string manipulation e.g. Snobol
text-processing e.g. roff
macro-processing e.g. cpp
grammar systems e.g. lex, yacc
formula/algebraic e.g. mathlab
graphical (CS2071)
CAD (CS1211)

Specification: describe the problem rather than the method of solution. e.g. VDM (CS1021, CS1112, CS1052, CS1532)
precondition = given:
a teapot that is heat resistant and water proof
& a kettle that can boil & tea leaves & water
postcondition = teapot contains tea leaves & boiling water

Constraint: related to Logic - describe a problem by the set of constraints any solution must satisfy, and ask the system to find a solution.
e.g. C * 9 = (F - 32) * 5; F = 50
-> C = 10

Data-flow brings together concepts from the functional, parallel/ distributed and constraint paradigms: imagine each operation in a program has its own associated processor, which executes the operation to produce a result whenever all its inputs (parameters) are available.