next up previous contents
Next: Lexical analysis - Lex Up: ho Previous: Introduction   Contents

Subsections

Assemblers and Compilers

Assemblers and compilers translate for later execution by real hardware or by software interpreters. They are application-specific programs just like any other, best written in HLLs, especially those specific to the application area

Assembler

simplifies the task of writing machine code programs

LLL program $\rightarrow$
\fbox {assembler}
$\rightarrow$ error messages, listings etc.

$\rightarrow$ binary program

  1. build words from characters, discard unimportant spaces & comments

  2. check legal statement

  3. check user-defined names (e.g. labels)
    - need to keep list of names & addresses

  4. translate [one-to-one]

Formally:

  1. Lexical (word) analysis
  2. Syntactic (sentence structure) analysis
  3. Semantic (meaning) analysis
  4. Code generation

These steps (and some others mentioned below) are known as phases - logical subdivisions of the overall translation task.

Language definitions

Grammar (Syntax + Lexical): Representation

special notations e.g. Backus Normal Form (BNF)

Semantics: meaning

In natural languages, sentences/statements only have meaning in a context.
Computers have no common sense or understanding as to what is going on, so context has to be carefully defined:

Compiler

Makes it seem as if the high-level language is the machine language.

HLL program $\rightarrow$
\fbox {compiler}
$\rightarrow$ error messages, listings etc.

$\rightarrow$ LLL or binary program

e.g.

        /* division by repeated subtraction */
        int main (void)
        {
          int ans=-1, a=99, b=6;
          do
            {ans=ans+1; a=a-b;}
          while (a>=0);
          return ans;
        }

First 4 steps the same:

  1. identify words etc. e.g \fbox {int} \fbox {main} \fbox {(} \fbox {void} \fbox {)} \fbox {\{} \fbox {int} \fbox {ans} \fbox {=} \fbox {--} \fbox {1} \fbox {,} \fbox {a} \fbox {=} \fbox {99} \fbox {,} \fbox {b} \fbox {=} \fbox {6} \fbox {;}
    \fbox {do} \fbox {\{} \fbox {ans} \fbox {=} \fbox {ans} \fbox {+} \fbox {1} \fbox {;} \fbox {a} \fbox {=} \fbox {a} \fbox {--} \fbox {b} \fbox {;} \fbox {\}} \fbox {while} \fbox {(} \fbox {a} \fbox {$>=$} \fbox {0} \fbox {)} \fbox {;} \fbox {return} \fbox {ans} \fbox {;} \fbox {\}}

  2. check against grammar

  3. check user-defined names (declared, var/label, int/real etc.)
    - keep list of names, addresses, types etc.

  4. translate [many-to-many?] e.g.
                    main    MVN     a1, #0          ; a1 = ans
                            MOV     a2, #&63        ; a2 = a
                            MOV     a3, #6          ; a3 = b
                    L       ADD     a1, a1, #1
                            SUBS    a2, a2, a3
                            BPL     L
                            MOV     pc, lr
    

  5. Code optimisation - before (Global), during, & after (Peephole) step 4.

What happens next?

Library

Increases set of operations available to programmer, by giving expansions for the more complicated operations. Translate operations once, separately from user programs, to create library. Libraries include a list of operation names & addresses.

Linker

Searches libraries to locate required operations and combines with user programs. When creating library, don't know where it will end up, so linker also has to relocate the code by changing addresses in it.

Execution and Interpretation

Why interpret?

interpreters are slow, simple & small

Exercises: ECO

  1. The following is the syntax of a simple language, ECO:

    name                    identifies an element of the language,
    name : something ;      defines the left hand name as equal to the right hand something,
    |                       separates alternative definitions,
    { something }           means that the enclosed something occurs none or more times,
    "text"                  indicates characters present in an actual program in the language.
    
    program         : statement { statement } ;
    statement       : declarative ";" | executable ";" ;
    declarative     : type identifier { "," identifier } ;
    type            : "CARNIV" | "VEGET" | "MALE" | "FEMALE" | "VEGETABLE" ;
    executable      : "BEGIN" { statement } "END" | action { "AND" action } ;
    action          : identifier op rest ;
    rest            : identifier { "WITH" identifier } ;
    identifier      : letter { letter } ;
    op              : "EATS" | "IS EATEN BY" | "HATES" | "LOVES" ;
    letter          : "a" | "b" | . . . | "z" ;
                      (i.e. all lower case characters of the English alphabet)
    
    New lines and spaces are ignored everywhere (e.g. do not separate words).

    Find all the syntax errors in the following ECO programs:

    (a)  CARNIV lion, tiger, cat, man, dog;
         VEGET minnow, man, dog;
         lion EATS man WITH dog;
         lion WITH tiger EATS man;
         minnow IS EATEN BY cat
    
    (b)  VEGETABLE onions, leeks;
         VEGET fred, albert;
         fred HATES onions WITH leeks;
         BEGIN fred EATS onions; albert EATS leeks; END
         CARNIV lion; albert IS EATEN BY lion AND lion EATS onions; onions IS EATEN BY
         fred WITH leeks;
    
    (c)  MALE jack; FEMALE joy; CARNIV joy;
         jack LOVES joy; joy HATES jack;
         jack IS EATEN BY joy; lion EATS jack WITH joy;
         lion LOVES jack with joy;
         jack AND joy HATES lion;
    

  2. The following are the static semantic rules for ECO:
    1. Each identifier must appear in (one or more) declaratives before it appears in any executables, and it must not appear in declaratives after it has appeared in executables.

    2. An identifier may appear in several declaratives and then is of several types simultaneously.

    3. identifier EATS (or HATES or LOVES) rest means identifier EATS (or HATES or LOVES) each of the identifiers in rest
      identifier IS EATEN BY rest means each of the identifiers in rest EATS identifier

      1. An identifier declared to be only CARNIV may only EAT identifiers declared to be one or more of CARNIV or MALE or FEMALE, but not those also declared to be one or more of VEGET or VEGETABLE.

      2. An identifier declared to be VEGET may only EAT identifiers declared to be VEGETABLE.

      3. An identifier declared to be both CARNIV and VEGET may EAT any identifier.

      4. Any identifier can LOVE or HATE any identifier. The identifiers must have been declared.

      5. Anything not explicitly permitted above is illegal.

    Find all the static semantic errors in the ECO programs above.


next up previous contents
Next: Lexical analysis - Lex Up: ho Previous: Introduction   Contents
Pete Jinks 2001-02-21