History: the very first version was created by John Backus, and shortly
after improved by Peter Naur, and it was this improved version that was
publicly used for the first time, to define Algol 60.
[Naur P (ed.), 1963, Revised report on the algorithmic
language Algol 60, Comm. ACM 6:1 pp1-17]
Compilers: Principles, Techniques, and Tools, by Aho, Sethi and Ullman says: "The proposal that BNF, which began as an abbreviation of Backus Normal Form, be read as Backus-Naur Form, to recognise Naur's contributions ... is contained in a letter by Knuth [Knuth D.E., 1964, Backus Normal Form vs. Backus Naur Form, Comm. ACM 7:12 pp735-736]" Knuth's letter (which you may be able to access via ACM) is interesting to read, as it indicates exactly what he thought the importance concepts in BNF were. He also points out that BNF is not a "Normal Form", which would imply there were some restrictions on how a grammar could be written down, as in e.g. Chomsky Normal Form or Greibach Normal Form. There is a message archived from the comp.compilers newsgroup that gives a different view of Naur's contribution. |
This first published version looked like:
<number> ::= <digit> | <number> <digit>
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
which can be read as something like:
"a number is a digit, or any number followed by an extra digit"
(which is a contorted but precise way of saying that a number
consists of one or more digits)
"a digit is any one of the characters 0, 1, 2, ... 9"
There are many variants, all equivalent to the original definition, such as:
<number> = <digit> | <number> <digit> . <digit> = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 .
number = digit . number = number digit . digit = '0' . digit = '1' . (etc.) digit = '9' .
number = digit | number digit . digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' .
number : digit | number digit ; digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
number = digit | number digit digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<number> = <digit> | <number> <digit> . <digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
Every variant has to distinguish between the names of grammar rules, such as
number
,
and actual characters that can appear in the text of a program, such as
0123456789
.
Usually, one or the other is quoted, e.g.
<number>
or
'0123456789'
(or "0123456789"
)
so the other can be left unquoted.
Sometimes both are quoted, as in the last example above.
However, if we decide to quote the names of rules and leave the actual
characters unquoted (as in the first example above) a
problem can arise if the metacharacters (i.e. the characters such as
' < > . |
used to punctuate the rules) can appear as
actual characters in the programming language. Therefore, the
most widely used variants of BNF usually quote the actual characters e.g.:
logical_expression = expression '|' expression | expression '&' expression . real_number = number '.' number .
e.g. ANSI C syntax from K&R using a BNF (the "%token" line lists things that are assumed to be simple enough to leave out, such as names and numbers. The form of BNF used is essentially that accepted by yacc/bison.)
History:
I first came across this style in the definition of Pascal
(e.g. identifiers from Amazon, or if-statements from CS370 at Augustana, Canada). Something similar was used for FORTRAN 77 (e.g. assignment from HP FORTRAN 77/iX Reference) and COBOL (e.g. How to Read the Syntax Diagrams from IBM VS COBOL II Reference Summary - you may need to fiddle with your font sizes and/or scaling to be able to read this properly!). Something more like EBNF (see below), but nearly as space-filling as syntax diagrams, was also used for COBOL. In this notation, alternatives were stacked vertically between brackets. (e.g. on slides 4 and 8 of (pdf) from Programming Language Concepts at Iowa U or e.g.
|
e.g. (this is a crude attempt to give you an idea of what the Pascal version looks like - the real thing looks a lot better!)
IDENTIFIER: +------+ ---|LETTER|--- / +------+ \ | | +------+ v / ----->|LETTER|----------------------> +------+ ^ \ | | \ +-----+ / ---|DIGIT|---- +-----+which can be read as something like:
The names of diagrams/rules, such as letter and digit, appear in rectangles, and actual characters appear in circles or in boxes with rounded ends.
You can see many (much better-drawn) examples of this particular style at The BNF Web Club e.g. (gif) and in the documentation for Ebnf2ps: Peter's Syntax Diagram Drawing Tool.
A LaTeX package to help produce syntax diagrams is described here.
You may also come across a slightly different, more angular version of this style e.g. in Computer Science: Abstraction to Implementation by Robert M. Keller
Like syntax diagrams, EBNF does not allow us to write anything that can't be written in BNF, it just makes the grammar easier to understand.
History: many extensions to BNF were used to define computer languages after Algol 60, all slightly different. Niklaus Wirth proposed a single formulation in: "What Can We Do About the Unnecessary Diversity of Notation for Syntactic Definitions, November 1977, Comm. ACM 20:11 pp. 822-823" However, this did not actually mention "EBNF". You may be able to access Wirth's article via ACM. Alternatively, you can see most of the details of his proposal here - look for section 5.9.1 |
I don't know of a universally accepted set of extensions to BNF, but here are some attempts:
Almost all variants of EBNF use brackets ( )
for the usual mathematical meaning of grouping items together.
(See
a detailed comparison of several different EBNFs.)
There are (at least) three main styles of EBNF:
number = digit+ . identifier = letter (letter | digit)* . functioncall = functionname "(" parameterlist? ")" .(I find this style the most natural, as grammar users will normally have to also use regular expressions to describe the smallest components of the grammar.)
number = digit {digit} . identifier = letter {letter | digit} . functioncall = functionname "(" [parameterlist] ")" .(It is unfortunate that the limited number of different kinds of brackets seems to limit the number of concepts that can be added to this style - e.g. we can't easily express "1 or more times". The whole point of EBNF is to make writing grammars simpler!)
number = 1*digit . identifier = letter *(letter | digit) . functioncall = functionname "(" [parameterlist] ")" .(This seems to me to be, technically, the most complete and therefore useful style, although perhaps the most ugly!)
x*
means the same as (x+)?
,
and x+
the same as xx*
,
and x?
the same as (x|)
,
so it is possible, but not as simple, to use a subset of these facilities.
ANSI C syntax from K&R using an EBNF based on regular-expressions
One further point - if we really want to minimise the amount of work
we do and make the grammar as clear as possible, then we need to deal
with the problem of list separators:
We can recognise lists where every item is identical,
like " We really need a notation where we can say:
a list of one or more names separated by ','
|
This page is maintained by and copyright © Pete Jinks [last modified 12/Mar/2004] suggestions, corrections etc. welcome You are welcome to make educational, not-for-profit use (else what would be the point!) but please give due credit.