Teaching: COMP20121

COMP20121: The Implementation and Power of Computer Languages

Power Part

(Semester 1, 2008/2009)

After each of my lectures I will place the slides for the lecture here

Slides

Lecture 1,

Overview

I, Peter Aczel, am teaching this module jointly with Pete Jinks. While his part concentrates on practical aspects of parsing text (in particular programs), my part will be concerned with theoretical issues. There's a short page to help you decide whether you should choose this module.

Content (of the theoretical part)

Official Syllabus for COMP20121, The Implementation and Power of Computer Languages.

The theoretical part of this course explores the question of what a computation is. If we define a computation to be what happens when we run a computer program then we find it very hard to say anything useful about the concept. In Physics, the theory abstracts away from what actually happens to a theoretical model where, for example, notions such as friction are neglected when considering a pendulum. It is similar with computer science, where we abstract away from a concrete machine and instead consider abstract ones. We start with very simple (almost primitive) machines which are called finite state automata, and consider the computations made by those. These turn out to be closely linked to the language of regular expressions, which is all about describing words that fit a certain pattern. Such expressions appear in linux programs such as grep, or in the workings of search engines that are clever enough to come up with occurrences of slight variations of the word you were looking for (singular versus plural, for example) and they are also applied in every compiler.

When learning about regular expressions and other syntactical devices you might find these somewhat unwieldy to deal with, and writing a parser for these may have seemed a rather difficult task. In this part of the course-unit we will study theoretical constructs that help with such jobs, so called automata. These have the advantage that they can be presented pictorially. While machines typically deal very well with syntax there are many situations where humans do much better when the information is provided in form of a picture. We will learn which automata correspond to regular expressions, and what has to be changed if we want to deal with more complex languages, resulting in the Chomsky-hierarchy.

One particular kind of automaton are so-called Turing machines named after the British mathematician and computer scientist Alan Turing who did ground-breaking work in the theory of computability. Turing was an eminent British logician who eventually accepted a position here in Manchester in the maths department, and you will find his picture around this building. He started with a simple idea. Take a function f that takes a natural number as an input and gives a natural number as an output. When is this function something that can be computed? This work was started in the thirties when computers did not yet exist, so Turing used the idea of a human carrying out the computations---a human who would be mechanically following instructions, that is, carrying out an algorithm or program. It turned out that this notion of computability is much more general than originally perceived, and we can use extensions of the arguments used by Turing to argue about what can be computed in languages such as ML, C, Java, or various assembler languages. The surprising answer is that it does not matter which language is used. The question of what kind of problems are intrinsically non-computable will occupy us in the last part of this course.

Learning the material

There are four components that will help you to learn the material of my part of the course unit. There are the lectures, lecture notes, text books and examples classes. The notes aim to give you a summary of the examinable material. Any text book (see the references) will provide a much more detailed account of this area, often going beyond what is taught in the course, and providing many more examples. But a text book, by definition, is a static medium, and some things are much better explained dynamically. In the lectures I can point at things, emphasize important issues, react to questions, repeat explanations in a different way, simulate processes dynamically. You can get none of these things out of even the best text book. The practical component of this part of the course consists of a number of exercises strewn throughout the notes. Ideally after one lecture (and before the next) you would (re)read the notes that were covered, and tackle all the exercises in that section.

Assessment

The assessment of the course is split into three components.

Labs. These will count for 20% of your final mark for COMP20121. See Pete's page and the lab manual for details.

Exercises. These will count for 10% of your final mark for COMP20121. For each examples class you will be asked to prepare a number of exercises from the notes. In the examples class a TA or member of staff will ask you to explain your work for at least one of the questions you tackled. This is to ensure that your work is your own rather than plagiarized. If you can do so satisfactorily, you will get credits for the number of exercises you solved, or solved for the most part.

Exam. The exam will count for 70% of your final mark for COMP20121, and as such is by far the most important component of assessment. Given the theoretical nature of the course, your best exam preparation is to solve the exercises given in the course handouts during the term, so that these abstract notions and technique have a chance to sink in. There will be two questions from each part of the course, and you will be allowed to answer any three out of those four. Some past papers are available online here.

Certainly the exam questions for the theoretical part of the course will have parts which are very similar to the exercise questions, and other parts where you are asked to reproduce some of the taught material, or apply it to new situations.

Examples classes

In some weeks there will be labs for the implementation part of the module, in others there will be examples classes for the power part of the module.

There will be three examples classes. Another purpose of the examples classes is to allow you to ask questions you have about solving the exercises.

The examples classes will be in weeks 4, 9 and 12.

Course notes

These were written by Andrea Schalk for a previous year's course and are being used again this year. They will be handed out in the lectures, but they will also appear here. Please send any feedback on the notes to A.Schalk at cs.man.ac.uk.

The exercises are at the end of each section. The last page of Sections 1, 2 and 3 contain the assessed exercises. Assessment will occur in the examples classes, where it will also be possible to ask questions regarding the non-assessed exercises.

Section 1: Regular Languages and Finite Automata. gzipped postscript version, pdf version.

Section 2: Context-free Grammars and Pushdown Automata. gzipped postscript version, pdf version.

Section 3: Turing machines gzipped postscript version, pdf version.

Section 4: Computability gzipped postscript version, pdf version.

Recommended books for this part of the course

M. Sipser. Introduction to the Theory of Computation. PWS Publishing Company 1997, ISBN 0-534-94728-X. A fairly mathematical book that nonetheless tries to keep mathematical notation at a minimum. Contains many illustrated examples and aims to explain ideas rather than going through proofs mechanically. Just over �30.

J.E. Hopcroft, R. Motwani and J.D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley 2001, 2nd edition, ISBN 0-201-44124-1. Derived from the classical text book on the subject. More verbose than Sipser, it aims to include up-to-date examples and applications, and to develop the material in much detail. Also contains a number of illustrated examples. I have heard that this book is very popular among students. About �40.

D. Kozen. Automata and Computability. Springer 1997, ISBN 0-387-94907-0. Presents the material in lecture-sized bites, fairly concise. Tries to avoid too much formality. Available as paperback for about �25.