© James Alan Farrell, August 1995
This paper may be copied and distributed for non-profit educational purposes only.
All other rights reserved.
Contact James Alan Farrell at j_alan@prodigy.net
for more information, or for questions, comments and corrections.
This paper is intended to give an overview of basic compiler theory. It is not intended to be complete by any means. It certainly will not replace a college level compiler course, however, it will provide a good solid foundation on which such a course can build.
I'm sure some Pascal purists, especially those who have worked on Pascal compilers, will find inaccuracies in this paper, especially in the section on parsing if statements, and generating stack frames for recursive block structured procedures. I must admit that I have spent more time parsing C and a non block structured but essentially Ada-like language (as part of the Beacon project - I will be posting a paper on BEACON sometime during Fall '95 - JAF), and things did not entirely make sense to me while I was writing those sections. I attempted to find a Pascal EBNF, both on the net and in my own library, but both attempts failed. So on this section I more or less had to guess my way through. Those familiar with parsing C will certainly see where my thought processes are coming from.
Finally, I am not up on my Pascal syntax for pointers. There are probably many errors where pointers are
concerned.
Knowledge needed to understand this paper
This paper is written on a level that can be understood by those who have an understanding of third semester computer programming (data structures). Modern parsers rely heavily on recursion and dynamic data structures. If you do not understand these programming constructs you will not be able to understand this paper.
This paper does not develop a full compiler, but small examples are given in many places. Where possible, examples are given in Pascal; that is the code being parsed is Pascal and the code performing the parsing is also Pascal. This language is simple and elegant. Where C is more powerful, Pascal is more readable. Any programmer should be able to look at Pascal examples and understand what they are trying to accomplish. Note to Pascal programmers - if the syntax of the examples looks strange to you, that is probably because they are ANSI standard Pascal, NOT Turbo Pascal. ANSI Standard Pascal does not support units, objects or strings.
Object Oriented Programming is not covered in detail, however there is a section given on parsing objects. It seems that C++ has become the standard object oriented language, but because of its heritage it is a convoluted language that is difficult to understand. So instead of using C++ for the examples the language Language for Object Oriented Programming (LOOP) will be used. This is a language I am working on, that is intended to be a beginners' object oriented language. This winter (probably around February) I intend to post a paper on the web on this language. At that time a link will be provided here to that paper.
The section on code generation assumes a working knowledge of the concepts of assembler programming. Assembler code is not given, but an in depth discussion of how a compiler produces assembler code is.