A major source of language complexity is identifiers; processing declarations, matching identifier usage to their declarations, and checking that the usage is legal. We will be discussing this at some length in the second half of the course unit, but for the time being note that once we have a set of routines to maintain and query a dictionary, we can simply embed calls to them in the parser actions.
A further source of complexity in some languages is that it is not possible to decide the type of an identifier before its first use, so we must scan the program more than once, initially to pick up type information, and finally to process the uses of identifiers. Some older languages, such as Algol 60, were designed this way by mistake, before the problem was properly understood. Some more recent languages, such as SML, were deliberately designed this way to simplify programming.
A source of processing complexity only loosely related to language complexity is found in interpreters (e.g. shells). If the language to be interpreted contains control structures (particularly loops), the interpreter has to repeatedly scan portions of input programs. (Compilers do not have this problem.)
A major source of processing complexity, not resulting from language complexity, is found in optimising compilers; good quality code results from the compiler repeatedly scanning and transforming the program.
One-pass language processors, where all further processing is performed via the parser actions, are usually simpler than multi-pass language processors, and so preferred by implementors. It is common for implementors to classify computer languages by the number of passes necessary to completely analyse them: e.g. Pascal is a 1-pass language, Algol-60 is 2-pass.
component_declarator : simple_component | bit_field ;
statement : while '(' exp ')' statement {$$= make_while ($3, $5);} | do statement while '(' exp ')' ';' {$$= make_do ($2, $5);} | '{' statements '}' {$$= make_block ($2);} | . . . ; statements : statement {$$= list ($1); /* [$1] */} | statements statement {$$= append ($1, $2); /* $1 @ [$2] */} ;
As long as we can ensure the correct order for the bulk of the reductions, so we do not overflow the stack, and for any actions other than tree-building, we can change part of the grammar to use right recursion so that the lists are built in the correct order. This will usually be safe, simply because the input is always going to be read from left to right and we will naturally split the detailed grammar of the list items (statement) from the list itself (statements). e.g.
statements : statement {$$= list ($1); /* [$1] */} | statement statements {$$= cons ($1, $2); /* $1 :: $2 */} ;If this is not possible, or we are worried that even with these precautions the parser stack might still overflow, we just have to use left recursion and
append
.
%{#include "y.tab.h" /* very silly implementation, just sufficient for this example */ static char findname (char *yytext) {return yytext [0];} %} %% [0-9]+ {yylval.a_number = atoi(yytext); return number;} [A-Za-z][A-Za-z0-9]* {yylval.a_variable = findname (yytext); return variable;} [ \t\n] ; [-+*/( );] return yytext[0]; . {ECHO; yyerror ("unexpected character");} %% int yywrap (void) {return 1;}
%{ #include <stdio.h> enum treetype {operator_node, number_node, variable_node}; typedef struct tree { enum treetype nodetype; union { struct {struct tree *left, *right; char operator;} an_operator; int a_number; char a_variable; } body; } tree; static tree *make_operator (tree *l, char o, tree *r) { tree *result= (tree*) malloc (sizeof(tree)); result->nodetype= operator_node; result->body.an_operator.left= l; result->body.an_operator.operator= o; result->body.an_operator.right= r; return result; } static tree *make_number (int n) { tree *result= (tree*) malloc (sizeof(tree)); result->nodetype= number_node; result->body.a_number= n; return result; } static tree *make_variable (char v) { tree *result= (tree*) malloc (sizeof(tree)); result->nodetype= variable_node; result->body.a_variable= v; return result; } static void printtree (tree *t, int level) { #define step 4 if (t) switch (t->nodetype) { case operator_node: printtree (t->body.an_operator.right, level+step); printf ("%*c%c\n", level, ' ', t->body.an_operator.operator); printtree (t->body.an_operator.left, level+step); break; case number_node: printf ("%*c%d\n", level, ' ', t->body.a_number); break; case variable_node: printf ("%*c%c\n", level, ' ', t->body.a_variable); } } %}
%union { int a_number; char a_variable; tree* a_tree; } %start input %token <a_number> number %token <a_variable> variable %type <a_tree> exp term factor %% input : exp ';' {printtree ($1, 1);} ; exp : '+' term {$$ = $2;} | '-' term {$$ = make_operator (NULL, '~', $2);} | term {$$ = $1;} | exp '+' term {$$ = make_operator ($1, '+', $3);} | exp '-' term {$$ = make_operator ($1, '-', $3);} ; term : factor {$$ = $1;} | term '*' factor {$$ = make_operator ($1, '*', $3);} | term '/' factor {$$ = make_operator ($1, '/', $3);} ; factor : number {$$ = make_number ($1);} | variable {$$ = make_variable ($1);} | '(' exp ')' {$$ = $2;} ; %% void yyerror (char *s) {fprintf (stderr, "%s\n", s);} int main (void) {return yyparse();}Example input expression, and output equivalent fully-bracketed expression and parse tree:
|