CS2112 June 1995 (Pete Jinks) -- Answers & Marking Scheme

Q1. This is similar to much of the practical work.

a. You can either write several grammar rules (1 per precedence level), or use %left etc. You do not need to do anything special about TRUE and FALSE - as we are ignoring semantics, these are identifiers like any others.

[2] I expect C-like precedence (etc.) rules, although I am not going to insist on you getting it right - I can't!

e.g.:		(all dyadic, infix, left-associative, except:)
  ()
  ! ~	monadic, prefix, right-associative
  == !=
  &
  ^
  |
  &&
  ||
  ?:	right-associative

[3]

exp1 = exp2 | exp2 '?' exp1 ':' exp1
exp2 = exp3 | exp2 '||' exp3
exp3 = exp4 | exp3 '&&' exp4
exp4 = exp5 | exp4 '|' exp5
exp5 = exp6 | exp5 '^' exp6
exp6 = exp7 | exp6 '&' exp7
exp7 = exp8 | exp7 '==' exp8 | exp7 '!=' exp8
exp8 = exp9 | '!' exp8 | '~' exp8
exp9 = identifier | '(' exp1 ')'
identifier = [A-Za-z_][A-Za-z0-9_]*

b.

[2] Once you have the grammar above, the trickiest bit is the communication between Lex & Yacc e.g. knowing what can be passed directly and what has to be combined into a single %token.

[1] Also, dealing with spaces etc., not mentioned above.

[1]Yacc:

  %token identifier or and ne eq
  %start list
  %%
  list : exp1 | list ';' exp1 ;
  /* exp1 to exp9 as above, except:
     ; at end of each rule, and replace = by :
     replace multi-character operators
  	'||','&&','==','!=' -> or,and,eq,ne
  */

[1]Lex:

  %%
  [A-Za-z_][A-Za-z0-9_]*	{return identifier;}
  "||"			{return or;}
  "&&"			{return and;}
  "!="			{return ne;}
  "=="			{return eq;}
  [;?:|^&!~()]		{return yytext[0];}
  [ \t\n]			;
  .			yyerror("unexpected");

c.

[1] i. Depends on the complexity of the translation. e.g. infix to postfix can ignore identifiers, but if we have to assign memory locations then we need a dictionary.

[1] ii. We will almost certainly need to remember the value of the operands (or anything similar), so we might as well put them in a dictionary.

[1] Provide a lookup function to search the dictionary, add the identifier if it does not already exist, and return a pointer or similar to the relevant entry in the dictionary. The namelist needs to be easy to search i.e. B-tree, hash-table etc.

[1] Probably get Lex code to do look-up, and return result to Yacc:
[A-Za-z_][A-Za-z0-9_]* {yylval.id=lookup(yytext); return identifier;}

[1] so Yacc code needs %union and %type or %token information, and:
exp9 : identifier {/* use $1 */} | '(' exp1 ')' ;

d.

[1] The list of operators is complete overkill if we really only have boolean values: & is &&, | is ||, ! is ~, ^ is !=

Identifier binding could be C-like, with declarations and assignment, or maybe SML-like, with let-statements.

[2] I expect something about the syntax changes, which are simple,

[2] but much more about semantic checks using the dictionary.

Q2. This is all bookwork, equivalent to about 3 lectures, although I don't expect you to regurgitate it all! A rough outline:

a.

[2] scope: static/compile-time life of a declaration (etc.)

[2] extent: dynamic/run-time life of a variable (etc.)

b. 2 of the following 3 parts:

[8]
COBOL
e.g. "programs", CODE & DATA areas, variables = records & files,
scope = "program" & LINKAGE, extent = run unless CANCELed

[8]
FORTRAN
CODE = main, subroutines, functions
DATA = as for CODE + blank & named COMMON
extent = call unless SAVEd, but see EQUIVALENCE
COMMON extent = all relevant calls, unless SAVEd

[8]
C
classical block structure, anonymous blocks
scope starts with block or with declaration
auto & static
recursion, stack frames
heap, malloc & free

Q3. This is partly from the lectures, and partly things you have to work out for yourselves.

a. I have talked about this sort of thing for a couple of lectures, and mentioned it in many of the others. e.g.
Use keywords in the control constructs, so: if (a) b= c; else d= e; becomes: if a then b= c; else d= e endif;
Allow ; after a { } block, so it becomes easier to write macros etc.
Tidy up declarations, so they are LR(1), or to get rid of multiple name-spaces, or to make them more readable etc.

[2*3]

b.

[2*1] Advantages of both ideas in a.

[1] Disadvantage: lose downward compatibility etc.

c.[3*2] There is at least one lab exercise about the different abilities of Lex and Yacc.

i. & ii. would not permit much in the way of checking, which is the point of many changes, but i. in particular might be useful for allowing compatability between old & new.

iii. would permit checking that "endif" matches "if" and "endfor" matches "for" etc. The disadvantages are that you would need a complete grammar for C, and must reassemble most but not all of the input in the output, which could be tricky for anything more complex than the keywords. Also, as ANSI C is not LR(1) without complex semantic processing, this could be very tedious to prototype.

d.[5] I expect a suitable choice from c., and a sensible discussion.

I was very impressed by the range of answers. Some were things I had mentioned (sometimes just in passing), some were taken from C++ or other languages, and some were trying to prevent everyday programming mistakes. Strangely enough, no one mentioned idC - perhaps it is too obvious, or too disliked!

types: LIST, SET, RDTs, modules or ADTs; string (resizable?); boolean, with restricted operations & strong type checking;

arrays: improved initialisers, bound checking, not reduced to pointers, non-zero lower bound, dynamically sized (multi-dimensional)

strong type checking, or disallowing dangerous coercions, or controlling pointer arithmetic

replace { . . . } by end_if or fi or begin . . . end etc.

call by reference

upper-case keywords

function currying or 1st-class items, or polymorphic

automatic (re)declaration of loop control variables

operators: reduce number of precedence levels, or reduce overloading (e.g. minus/subtract, or integer/float), or change to be more easily distinguished (e.g. ==,= -> EQUALS,BECOMES; ||,| &&,& -> ANDALSO ORELSE), or make monadic operators non-associative

switches: automatic breaks inserted at the ends of cases, or ranges of case values

Q4. Each part corresponds to roughly one lecture. Again, I don't expect you to regurgitate it all! A rough outline:

a.

[3] You have done this in C & SML in several courses, including this one. I expect a function for each, called a few times.

[2] Pointer operations: malloc, free, & and *, test, assign, arithmetic

[2] Garbage collection is hard if we don't know when something is a pointer (strong typing, undiscriminated unions etc.) or we don't know where it points (pointer arithmetic).

b.

[3] Concatenation, selection, repetition are ...

[1] Yes, may need extra flags and/or duplicate code.

[1] ;/,/{} if/switch/?: while/for/do(/recursion?)

[1] jump to start or end of block: break, continue, return etc.

[1] functions, goto, setjmp etc., computed goto, guards, exceptions etc.

I may give bonus marks for longer answers.

c. C: declarations [n], initialised [] and *, parameters

[4] element type = scalar or array, iliffe vectors access [i][j]

[3] Other languages: lower bound, index type not integer access [i,j], bound check slicing, dope vectors

d.

[5] 5 methods

[2] C = value & procedure; can similate reference & copy