This information is no longer being maintained.
You can find more information about Computer Languages, Parsing, Grammars, and Compilers here.

CS2111: Design and Implementation of Programming Languages

(CS2111 was known as CS2112 when it ran in the 2nd semester.)

syllabus

The handouts are available via the table of contents of lecture handouts (which includes answers to the paper exercises).
booklist.

What did students think about this course
"Why did you chose compiler development?"

Lab exercises

Lab exercises	including the following links to extra information & hints:
Introduction:	debugging flex, byacc and make
Exercise 1:	condensed man page for egrep and regular expressions used with flex
Exercise 2:	lex manual and strings and comments characters [local] and pre-processing [local] and syntax [local]
Exercise 3:	yacc manual and use of %left etc. and examples
Exercise 4:
Exercise 5:	grammar and strings and comments and extensions
Exercise 6:	spim and simulator

New information added since lab manual printed

New information added since lab manual printed
exercise 2 In the example output, I have picked up adjacent white-space characters to be a single lexeme using [ \t\n]+ but in the problem description and in the starting code I only pick up single white space characters using [ \t\n] I would prefer to do the former, but I have to do the latter to be able to recognise preprocessor commands properly - can you see why? If not, try changing the rule for white space to see what happens - you should get lots of messages from the checker. When you "make test" for the first time, you should see something like this: echo your output is being redirected to "out1" your output is being redirected to out1 ./c_lexemes <data.c >out1 echo here are any error messages - look at "out1" for more details here are any error messages - look at out1 for more details grep OOPS out1 4 OOPS - #include <stdio.h> is a preprocessor_command (not white_space etc. - see following list) 4 OOPS - #define a(nasty)\ echo here is the final report, counting characters and lexemes here is the final report, counting characters and lexemes wc -c <data.c ; tail -20 out1 1197 numbers of lexemes & characters - 22/9/99 10:17 GMT 380 ignore(s) in 960 characters 0 float_number(s) in 0 characters 0 octal_int_number(s) in 0 characters 0 decimal_int_number(s) in 0 characters 0 hex_int_number(s) in 0 characters 0 preprocessor_command(s) in 0 characters 0 comment(s) in 0 characters 0 character(s) in 0 characters 0 keyword(s) in 0 characters 0 built_in_type(s) in 0 characters 0 identifier(s) in 0 characters 0 punctuation_or_operator(s) in 0 characters 0 punctuation(s) in 0 characters 0 operator(s) in 0 characters 0 string(s) in 0 characters 0 unknown(s) in 0 characters 237 white_space(s) in 237 characters total: 617 lexemes in 1197 characters As you can see, the checker is over-enthusiastic about pre-processor commands. There is a second, rather larger set of test-data: bigdata.c You are welcome to try your program on this - remove the leading "#" characters from these 3 lines in the makefile to do so: # ./c_lexemes <bigdata.c >out2 # -grep OOPS out2 # wc -c <bigdata.c ; tail -20 out2 Don't get bogged down trying to deal with some of the wierd things in this file before you have dealt with the simpler problems in the first data-file.
exercise 6 local copy of MIPS/SPIM documentation In 1998/9, Chris Whitworth reported that spim and xspim are available in /opt/undergrad under Linux, but not yet under SunOS/Solaris. You need to copy /opt/undergrad/share/trap.handler to your working directory first.
If your program crashes at run-time, making a "core" file, and you want to find out what line it went wrong at etc., on Suns do: dbx program_name where quit or on Linux do: gdb program_name core where quit (This relies on using `gcc -g`, as with the makefiles I provide.)

exercise 2 In the example output, I have picked up adjacent white-space characters to be a single lexeme using

	[ \t\n]+

but in the problem description and in the starting code I only pick up single white space characters using

	[ \t\n]

I would prefer to do the former, but I have to do the latter to be able to recognise preprocessor commands properly - can you see why? If not, try changing the rule for white space to see what happens - you should get lots of messages from the checker.

When you "make test" for the first time, you should see something like this:

echo your output is being redirected to "out1"
your output is being redirected to out1
./c_lexemes <data.c >out1

echo here are any error messages - look at "out1" for more details
here are any error messages - look at out1 for more details
grep OOPS out1
4       OOPS -  #include <stdio.h> is a preprocessor_command (not white_space etc. - see following list)
4       OOPS -  #define a(nasty)\

echo here is the final report, counting characters and lexemes
here is the final report, counting characters and lexemes
wc -c <data.c ; tail -20 out1
   1197

numbers of lexemes & characters - 22/9/99 10:17 GMT
  380 ignore(s)                         in   960 characters
    0 float_number(s)                   in     0 characters
    0 octal_int_number(s)               in     0 characters
    0 decimal_int_number(s)             in     0 characters
    0 hex_int_number(s)                 in     0 characters
    0 preprocessor_command(s)           in     0 characters
    0 comment(s)                        in     0 characters
    0 character(s)                      in     0 characters
    0 keyword(s)                        in     0 characters
    0 built_in_type(s)                  in     0 characters
    0 identifier(s)                     in     0 characters
    0 punctuation_or_operator(s)        in     0 characters
    0 punctuation(s)                    in     0 characters
    0 operator(s)                       in     0 characters
    0 string(s)                         in     0 characters
    0 unknown(s)                        in     0 characters
  237 white_space(s)                    in   237 characters
total:   617 lexemes                    in  1197 characters

As you can see, the checker is over-enthusiastic about pre-processor commands.

There is a second, rather larger set of test-data: bigdata.c
You are welcome to try your program on this - remove the leading "#" characters from these 3 lines in the makefile to do so:

#       ./c_lexemes <bigdata.c >out2
#       -grep OOPS out2
#       wc -c <bigdata.c ; tail -20 out2

Don't get bogged down trying to deal with some of the wierd things in this file before you have dealt with the simpler problems in the first data-file.

exercise 6

local copy of MIPS/SPIM documentation

In 1998/9, Chris Whitworth reported that spim and xspim are available in /opt/undergrad under Linux, but not yet under SunOS/Solaris. You need to copy /opt/undergrad/share/trap.handler to your working directory first.

If your program crashes at run-time, making a "core" file, and you want to find out what line it went wrong at etc., on Suns do:

	dbx program_name
	where
	quit

or on Linux do:

	gdb program_name core
	where
	quit

(This relies on using gcc -g, as with the makefiles I provide.)

If you have any questions about or corrections to any of the lab exercises, please email me so I can correct or improve the descriptions. Many thanks to everyone who asked questions and made suggestions.

I supervise the lab, helped by the demonstrators.

Exam

January 1999 Exam
The lex+yacc question (question one) is compulsory. You have to answer 2 of the other 3 questions. For questions three and four you can answer any 2 of the 3 parts. (Questions one and two have no optional parts.)

You will get these (or something similar) in a handout at the end of the course:

Some notes about exam technique, and some hints about the exam
The June '96 exam questions and answers + marking schemes
The June '97 exam questions and answers + marking schemes
The January '98 exam questions and answers + marking schemes

You could also look at the June 1995 exam questions and answers + marking scheme (e.g. you may find the lex+yacc question (Q1) useful) but these questions are very different in style from those I use now, and the content of the lectures has changed somewhat, so don't read too much into them. In particular, the exam was biased towards ANSI C because I had to write it before I had given the lectures!

Paper + pencil exercises

There are no formal examples classes, as they have always seemed to me to be a waste of time for the majority of students and staff. Instead, the lab has been extended from 5 to 8 sessions, and the first four lab exercises are intended to allow you to gain confidence in using simple concepts directly from the lectures.

Also, many of the lecture handouts include exercises that you may find helpful. Some simply use concepts in the handout, others explore beyond these limits. You are not obliged to do them, but I hope you will find them useful for revision or to prepare for the lab. I put answers into the Undergraduate Resource Centre (and they can be found via the list of handouts).

There are also suitable exercises in many of the recommended books etc.

I am very happy to discuss any of these with you during the lab or after lectures, or by email.

Other resources

Computer Languages

The content of this page and those it links to is copyright © Pete Jinks, except, obviously, for links that go to other web-sites, and pages that I have myself copied from elsewhere. (I hope I have correctly acknowledged any such copying - please let me know if I have missed any out.)

You are welcome to make educational, not-for-profit use of my work, but please give me credit when you do so.