From: greg@cee.hw.ac.uk
Newsgroups: comp.compilers
Subject: Re: Concrete to Abstract syntax
Date: 24 Jun 1996 11:00:19 0400
Organization: Dept of Computing and Electrical Engineering, HeriotWatt University
The following is from I course I used to give on language definition.
Greg Michaelson

Abstract syntax
When we come to define semantics we want to associate meanings
with syntactic constructs. However, concrete syntax contains
information which is not directly relevant to semantics.
Consider the grammar for expressions once more:
::=  +   ::=  *  / ::=   ::=  ( )
This grammar is chosen to reflect the operator precedence:
() before
 before
* / before
+ 
Thus, the structure of the rules imposes a structure on symbol
sequences: "+" and "" in s qualify the preceding
s and following s; "*" and "/" in s
qualify the preceding s and following s; "" in
s qualify the following s; "(" and ")" in s
qualify the enclosed s.
Consider parsing:
(3+4)*56
using the above grammar for s:

+++
  

 
+++
   
*
  
  
+++
    
( ) 6
 
+++ 5
  
+
 
 
 
 
3

4
This structure reflects the order in which processing is to
proceed: "3+4" before "(3+4)*5" before "(3+4)*56". However, from
the point of view of that processing, much of the structure is
irrelevant. Thus, to calculate "3+4" we don't need to know that
"3" is a is a is a : knowing that "3" is
a rather than a subexpression is sufficient. Similarly,
to calculate "(3+4)" the brackets are irrelevant. Similarly, to
calculate "(3+4)*5" we don't need to know that "(3+4)" is a
is a is a : knowing that it is a subexpression
rather than a is sufficient.
In general, provided that we know the order in which processing is
to take place, all we need to distinguish are s, sub
expressions and operators. Thus, we can simplify the above
structure to:

+++
  

 
+++
   
* 6
 
+++
   
+ 5
 
 
3 4
This corresponds to the abstract syntax:
::=    ::= +    *  /
The most significant change is that we have dropped chains of
single rules. Thus:
> > > > > 6
has become:
> > 6
This abstract syntax has lost precedence structure and brackets.
However, its purpose is not to recognise symbol sequences but to
identify the structure of symbol sequences relevant to their
meaning.
From a compiling point of view, concrete syntax tells us how to
recognise the overall structure of symbol sequences; abstract
syntax tells us what aspects of that structure need to be
retained, say in a parse tree, for semantic processing.
We can find concrete syntax from abstract syntax by folding rules
together. Consider expressions yet again:
::=  +   ::=  *  / ::=   ::=  ( )
First of all, replace the single in with the
options from and then replace the qualified with
:
::=  ( )  
Next, replace the single in with the options from
and replace all qualified s with :
::=  ( )    * 
/
Next, replace the single in with the options
from and replace all qualified s with :
::=  ( )   
*  / 
+  
Next, generalise the options with an operator between
s:
::=  ( )   
::= *  /  +  
Next, drop the brackets from the bracketed as to
process it we just process the :
::=    
::= *  /  +  
Finally, drop the redundant single :
::=    ::= *  /  +  
In general, for each rule Ri:
i) for an option which is a single rule Rj, replace it with the
options for that rule
ii) replace any other references to Rj with Ri
These two rules remove the chains of single rules from parse
trees.
iii) drop any symbols which are not significant for distinguishing
meaning
This rule is partly a matter of style. In principle we can drop
all symbols except one to distinguish the construct. In practise,
we may leave some "redundant" symbols in place to ease
understanding of the final abstract syntax.
iv) drop any options which are just Ri
This rule removes redundant circularities like:
::=
For example, for statements:
::=  ; ::=  