Reading: Brookshear ch. 5 or Greenfield ch. 6, 8.3-4
Computer languages must be unambiguous, so they have relatively few, precisely defined rules for composition of programs, and strictly controlled vocabularies in which unknown words must be defined before use.
Natural languages are ambiguous, fuzzily structured and have large (and changing) vocabularies.
1950s: first HLLs
1969: ~120 HLLs, about 15 in widespread use
1977: 81 HLLs in active (non-trivial) use
Today: 2 to 4 thousand HLLs etc.
Computer Science is constantly changing, so there is evolution of:
* concepts, &
* notations for concepts.
Weak form: patterns of thought affect language
Strong form: language affects patterns of thought
Hard to tell if evidence supports the strong or the weak (or either) form, but it "feels" plausible. It is difficult to think about, or discuss, or use, a concept that you can't pin down.
The Sapir-Whorf hypothesis indicates that different computer languages will enable us, or maybe even force us, to think about different aspects of the problem we are trying to solve.
1st Generation Language (GL): binary machine code
2nd GL: assembler (Low Level Language LLL)
direct memory access & allocation;
simple machine-like instructions
3rd GL: High Level Language (HLL) (1952)
memory access & allocation via operators;
expressions & explicit flow of control
4th GL: application (data processing/database) languages
embedded in a development environment
5th GL: Declarative or Very High Level Language (VHLL)
fully hidden memory access, automatic allocation;
fully abstract machine
Declarative languages express what must be achieved,
let the system work out how to achieve it
Operational languages express how something is achieved,
let the reader work out what is being achieved
* easy to write: useful concepts & facilities, relevant to application
* easy to read: computer, your future self, others
for reuse, maintenance, enhancement etc.
* portability: other compiler/toolset suppliers, users, computers
-> standards
* error detection & reporting:
during translation: best, but program verbose, restricted?
during execution: exceptions, debugger etc.
depends on the implementation much more than the language
language evolution:
powerful & dangerous
restricted & safe
powerful & safe?
There is currently no one best language in all situations - we must pick the most suitable language(s) for each particular problem or situation. What factors might we need to take into account?
Essential features:
* structured
* available
* approachable
* robust
* local knowledge
* environment
* bindings to other systems
* books
Desirable features:
* advanced features & concepts e.g. modules, abstract data types,
object-oriented
* real world
* cleanliness
* support
* available widely & cheaply
* standardised
There is no one language that satisfies all these criteria, but our eventual
choice, idC, satisfies most of the more important ones.
* illustrate very different way of thinking about programming
* local knowledge
* availability
* robustness
* cleanliness
* books
Again, no one language perfectly satisfied these criteria, but SML was
thought to be the best choice.
e.g. expressions
* infix: SML, C: a = b + c
* prefix: LISP: (set a, (add b, c))
* postfix: FORTH: b c + a =
* "english": COBOL: ADD b TO c GIVING a
* vector/matrix etc.
* distributed: occam:
PAR
to_a ! b + c
to_a ? a
Syntactic differences are important, but nowhere near as important as semantic differences.
Paradigm originally meant "sample" or "example", but now has several
meanings. In Computer Science, it is taken to mean "a coherent set of
methods that have been found to be (more or less) effective in handling a
given type of problem".
Different kinds of languages emphasise different things about the problem, and so are better at describing different aspects of the solution, or even different kinds of problems and solutions.
What should be the main concept(s) embodied in a programming language?
State n. condition; circumstances at any time; a phase or stage; ...
= where we are + the (partial) result of getting there
In a computation, represented by e.g. values of registers and memory
(variables etc.)
1GL, 2GL and the first 3GL, were all about representing state, the
actions that modify the state and the sequence of the actions -
the Imperative paradigm.
However, although this is intuitive for simple problems, it became clear that this did not scale well - i.e. it becomes disproportionately hard to use as problem size increases. To determine whether a program will work correctly, we must examine e.g. all possible combinations of actions on all of the state.
We need to control which actions can be performed on which parts of the state
-> type checking.
We can only think about a limited number of different things at once,
so limit what we need to consider - we want to control where parts of
the state can be modified,
and where actions
can be used -> scope.
Scope and type checking became an integral part of all kinds of programming languages, but also gave rise to a new programming paradigm: Object Oriented
This is similar to Imperative
but with maximum use of types & scopes - keep state in objects, each
type of object having its own set of actions. Furthermore, the state in an
object can only be accessed or modified via the associated actions.
As well as controlling state, other programming paradigms were invented that reduced the need to use state at all:
Functional paradigm - about transformations of values
(notation usually makes it easy to describe & examine values)
Logic paradigm - about knowledge & understanding
Another difficulty with Imperative programming, associated with the concept of state, was the concept of sequence - there are many circumstances where the exact order of some actions does not matter, as long as they are all done before we progress to the next step.
Parallel paradigm - if actions
don't interact, work on them in any order (non-determinism),
or even simultaneously (multi-processing).
Also combinations
and extensions
of these paradigms, and other special purpose language paradigms.
(The program fragments are not intended to correspond precisely to any one language, although they are based on real languages.)
e.g. Assembly languages, Fortran, Cobol, Basic, C, Pascal (MU0, CS1031, CS1062, CS1511, CS1522)
e.g. making tea:
1) put water in kettle
2) boil kettle
3) put tea-leaves in teapot
4) pour boiling water from kettle into teapot
declare kettle, teapot
kettle= water;
boil (kettle);
teapot= tea_leaves;
teapot= teapot + kettle;
[water,
tea_leaves etc. are fixed values]
PRO:
* concept of a sequence of steps is easy to understand
* computational theory: Turing machines
* easily transformed into computer instructions
CON:
* too easy to bog down in detail
* must specify sequence even when we don't care
* different parts of the program interact via state,
state modified by side effects (e.g. procedure calls),
so hard to understand/ check/ modify what is happening
Based on evaluation of expressions built out of function calls (applications). e.g. LISP, SML (CS1041, CS2511)
let boiling_water = boil (kettle, [water])
in (teapot, [tea_leaves, boiling_water])
end
[kettle, teapot, water, tea_leaves etc. are all fixed values]
PRO:
* referential transparency - functions are like mathematical
functions i.e. no side effects -> easy to understand, check, modify
e.g. (teapot, [tea_leaves, boil (kettle, [water])])
* prototyping
* functions are first class objects
i.e. we can create & modify them in a program
* some unimportant details disappear
* computational theory: Church's lambda calculus
CON:
* I/O hard
* some data structures hard to represent "naturally"
particularly if they are to be modified
* naive implementations very slow, but good implementations nearly as
fast as imperative
Horn Logic: a certain condition (goal) is true if all the conditions
in a list (subgoals) are true. If there are no subgoals, the goal is a
fact. There can be alternative lists of subgoals for the same goal.
Specify a problem by describing its facts and properties.
Solve a problem by giving the system a goal to prove.
e.g. Prolog
(CS1412, CS2432)
e.g.
* tea is made in a container holding tea leaves and boiling water.
* a container can be made to hold an item if the item is in a second
container and we move it.
* a container holds boiling water if it holds water and we can heat the
container to boiling point.
We can add some extra "common-sense" information to the system, by telling it that a container we make tea in should be water proof and heat resistant, and that tea-leaves come from a tea-caddy and water from a tap.
make(Con,tea):-water_proof(Con),heat_resistant(Con),
contains(Con,tea_leaves),contains(Con,boiling_water).
source(tea_caddy,tea_leaves).
source(tap,water).
source(Con,boiling_water):-canboil(Con),contains(Con,water).
contains(Con,Item):-source(Con,Item).
contains(Con,Item):-source(Con2,Item),move(Item,Con,Con2).
canboil(kettle).
water_proof(teapot).
heat_resistant(teapot).
move(Item,Con,Con2).
[names starting with uppercase letters are unknowns, those starting with
lowercase letters are fixed.]
if we ask: make(teapot, tea) or: make (X, tea).
the system answers: yes or: X = teapot
We can even get the system to tell us how to make tea:
move(Item,Con,Con2):- write('move '), write(Item),
write(' from '), write(Con2), write(' to '), write(Con), nl.
canboil(kettle):- write('boil kettle'), nl.
and it
will output:
move tea_leaves from tea_caddy to teapot
move water from tap to kettle
boil kettle
move boiling_water from kettle to teapot
[getting the system to answer correctly is generally more difficult than
this.]
PRO:
* prototyping
* AI
* Theory: predicate logic etc.
CON:
* slow: it searches through all possible combinations to find solutions.
naive implementations can be very slow.
Encapsulate state in objects, which can only be accessed via a defined
set of operations.
e.g. Simula, C++, Smalltalk-80, Eiffel, Object Pascal
class containers . . .
put_in = . . . take_out = . . . contents = . . .
class water_proof subclass of containers
pour_to = . . .
class boilable subclass of water_proof
boil = . . .
boilable kettle;
water_proof teapot;
kettle.put_in (water);
boil.kettle ( );
teapot.put_in (tea_leaves);
kettle.pour_to (teapot);
[water, tea_leaves etc. are fixed values]
PRO:
* When we create a new class of object, we can reuse previous
work by inheriting some or all of the definitions of similar objects,
leading to hierarchies of objects. Some languages allow multiple
inheritance, where a new object can reuse the definitions of several other
objects.
* inheritance leads to polymorphism, which in turn simplifies
maintenance & project control
* as user requirements change and system evolves, entities in the system
less likely to change than the operations on them, so less likely to have to
make fundamental changes to existing code
* easy to modify implementation of objects - prototyping
* a design methodology as well as a language model, so can capture more
of the programming process (CS1052, CS1532, CS2032)
* classes -> typing system very powerful & discriminatory
CON:
* In most paradigms, we can resolve types at compile time. In OO languages,
the flexibility gained from inheritance and polymorphism means that we can
only completely determine the type of an object at run time (dynamic
binding) -> slower.
Contain explicit constructs for running different pieces of a program
concurrently:
parallelism: interrupt, coroutine, fork & join, PAR [granularity]
communication: state,
message
(sync. or async.)
synchronisation: state (poll/busy wait, lock, semaphore, monitor),
message (guard, rendezvous, RPC)
e.g. Simula, Algol 68, Concurrent Pascal, CSP, Ada, occam, Concurrent C,
Concurrent Smalltalk, Multilisp, Concurrent Prolog
CHAN OF ANY to_pot, to_kettle, kettle_to_pot, to_cup:
PAR
declare water:
SEQ -- kettle
to_kettle ? water
boil ( )
kettle_to_pot ! boiling_water
declare tea_leaves, boiling_water:
SEQ -- teapot
PAR
to_pot ? tea_leaves
kettle_to_pot ? boiling_water
to_cup ! tea
[a complete program would include the other ends of to_pot, to_kettle & to_cup; names not declared above represent fixed values e.g. tea]
* some problems are easier to implement using parallel constructs, because
they exhibit inherent parallelism - the resulting programs may still be run
on a single processor (pseudo-parallel)
* some problems can be solved faster by implementing them on a parallel machine and dividing the work over several processors
* some applications need to run on multiple processors or workstations connected by a network - distributed systems
CON:
* parallel & distributed programs are fundamentally more difficult to
write & check e.g. deadlock, load-balancing
* communications & synchronisation overheads may negate much of the
advantages of having multiple processors e.g. waiting
Most parallel languages have imperative features, but other paradigms can have parallel variants as well. In fact, the strict sequencing of imperative languages clashes badly with the relaxation of parallelism, and other paradigms can incorporate parallelism more simply. They may not even require many changes to the language, instead just altering the method of execution.
Special-purpose languages are intended for use in a single class of
applications. Often as the interface to a particular piece of software -
Command languages (HCI). Often imperative, although other paradigms
exist e.g. WIMP excellent for simple actions. (CS1311)
operating system (CLI, shell) e.g. ksh
spreadsheet e.g. "A1 = sum(B1:E1)"
data base e.g. SQL
vector/matrix e.g. APL
string manipulation e.g. Snobol
text-processing e.g. roff
macro-processing e.g. cpp
grammar systems e.g. lex, yacc
formula/algebraic e.g. mathlab
graphical (CS2071)
CAD (CS1211)
etc.
Specification: describe the problem rather than the method of solution.
e.g. VDM
(CS1021, CS1112, CS1052, CS1532)
e.g.
precondition = given:
a teapot that is heat resistant and water proof
& a kettle that can boil & tea leaves & water
postcondition = teapot contains tea leaves & boiling water
Constraint: related to Logic - describe a problem by the set of
constraints any solution must satisfy, and ask the system to find a solution.
e.g. C * 9 = (F - 32) * 5; F = 50
-> C = 10
Data-flow brings together concepts from the functional, parallel/ distributed and constraint paradigms: imagine each operation in a program has its own associated processor, which executes the operation to produce a result whenever all its inputs (parameters) are available.