Indentation to show block structure?

Charles Fiterman (cef@geodesic.com) wrote about using indentation to show block structure, and this started another chain of discussion:

Semicolons are not only just a habit, they are a very bad habit.

In one study a large amount of C code was scanned for lines without semicolons. It excluded things like

	fun_call(a,
		 b);

to look only for expressions on more than one line. It seems that about a fourth of the lines found were errors. The missing semicolon was simply wrong about one fourth of the time. Clearly C would be better off with no semicolon and with a rule allowing continued lines. Perhaps new line ends a statement except within an open ( ), or [ ].

Further I find { } silly. Everyone in their right mind indents. Count indentation from the leftmost character of the previous line. Either eliminate tabs as valid within programs or give them some standard meaning like a tab takes you to the nearest 4's boundary. I would prefer eliminating from the language, this doesn't mean editors couldn't use tab as a command to put in spaces.

This means we could write

if x < y
	cout << "hello world\n"
	cin >> i

Instead of

if (x < y) {
	cout << "hello world\n";
	cin >> i;
}

The former is obviously a lot cleaner and eliminates a lot of style disputes. Further it eliminates the common error where {} are forgotten and the user forgets that indentation is not syntax. Lines and indentation are the most obvious parts of code, the compiler should see them too.

As well as starting more discussion about language design, this provoked a discussion about programming environments which looked at editors, Occam (a programming language that does indeed use indentation to show block structure), and the meaning of tabs, and another discussion about workplace practice which looked at automatically generated code and team work

Greg Morrisett (jgmorris@cs.cmu.edu) responded to Charles Fiterman

The missing semicolon was simply wrong about one fourth of the time. Clearly C would be better off with no semicolon and with a rule allowing continued lines. Perhaps new line ends a statement except within an open (), or [].

I think the real problem is that we're always trying to boil things down to a single mechanism for determining delimiters (e.g. newline vs. semicolon.) The right answer, in my mind, is to require both. Your study indicates that there really is no overhead for this (assuming people are already putting semicolons at the end of statements) and more errors would be caught. At the very least, the compiler/lint should issue warnings for such lines. A separate mechanism should be used to indicate that a statement spans multiple lines.

to which Tim Channon (tchannon@black.demon.co.uk) responded

The right answer, in my mind, is to require both.

Is it? How about:

C-like is ambiguous and the presence or absence of a semicolon can change the meaning of a program. Humans are error prone so not surprisingly a lot of mistakes occur.

One alternative is Wirth-like where the ambiguity has gone and a missing semicolon where one is needed trips an error, too many semicolons produce no error and no change in meaning. The semicolons are in that instance there to help the human read the source where the redundancy is to help us.

So I suggest the subject is danglies. You accept them, you also accept the penalty.

Using different indentation styles

mark@omnifest.uwm.edu (Mark Hopkins) took a different tack

I simply don't agree with any of the original poster's assertions.
I think this means Charles Fiterman, who proposed dropping {...} and using indentation instead.

First of all, syntax errors are far too trivial to be of any relevance in any discussion. It might have been a concern back in the old days when a programmer had to (re)submit a program to a central site, but nowadays it's an entirely trivial matter to pipe an entire error listing back into [an editor to locate] and eliminate all the errors in one fell swoop and usually doesn't take much more of a minute.
I have often been reduced to correcting errors one-by-one, because the compiler got so lost after an error that I could not trust it. Also, the nastiest syntax errors are those that give rise to a legal but wrong program, and designing the language to make that less likely is well worth while.

Second, experienced programmers simply don't make that many syntax errors.
Everyone has to learn each new language sometime, so why not make the tedious part of that (learning the syntax) as simple as possible, so we can concentrate on the important part (learning the semantics).

Third, I personally don't have any problems with the semicolon syntax in C and in fact find it accords quite well with natural usage style in human languages. Arguing against punctuation for delimiters in a programming language (especially when the alternative is nested bracketing of some kind which is inimical to natural language and natural style (apparently except German (which probably explains why the Pascaloids and Moduloids seem to have Teutonic heritage and affinity))) is a lot like arguing against the use of periods at the end of sentences.

Fourth, as per the comment that nobody in their right mind uses {...} without indentation: I'm in my right mind, but I don't always indent. I use a VERY great multiplicity of stylistic forms, often based on semantic, layout and context considerations, including the following:

if (A) S

if (A) S, S, ..., S

if (A) { S; S; S; S; ...; S; }

if (A)
  S, S, ..., S,
  S, S, ..., S,
 ...
  S, S, ..., S

if (A) {
  S; S; ..., S;
  S; S; ..., S;
 ...
  S; S; ..., S;
}

if (A) while (B) ...

if (A)
  while (B) ...

if (A)
while (B) ...

for (...)
for (...)

for (...)
  for (...)

and forcing layout for the compiler's sake is going to destroy all of this.

You may argue that this obfuscates code. No quite the opposite. The use of expressivity to reflect intention not only clarifies matters but, to prove the point, I can literally go back into code I haven't written in years (and have) and immediately see what the intended meaning was *even without comments*. Of course, part of that also has to do with the naming conventions I use too. This ability is the one determinant and main function of proper style.

This is an excellent point. I use alternative layouts all over the place to distinguish between important and trivial constructs, and combinations like:

	for/while (...) if (...) {

and
... {induction variables=...
Of course, Mark does say that HE can go back into his code; it would be interesting to see how this sort of thing could be systematised for use by teams, or where maintenance may be done by a different programmer.

salomon@silver.cs.umanitoba.ca (Daniel J. Salomon) responded to Mark Hopkins

First of all, syntax errors are far too trivial to be of any relevance[...]

The discussion was more about syntax than about syntax errors. Poor syntax can lead to errors that persist until run time. For instance in the C language, a programmer may become used to ending every line with a semicolon may absentmindedly write:

while(x < 10.0);

and not notice the error. There is no syntax error here, but this statement is almost certainly a run-time error.

Other classic examples of error prone syntax in C are:

Accidentally using = instead of ==
forgetting a "break" statement at the end of a "case" arm
leaving off the parentheses in a call to a parameterless function returning void
forgetting a closing comment delimiter.
...

A more relevant question is, do you prefer the semicolonless syntax of the C preprocessor, or the semicolons of the C language? Which one causes more run time errors, and which is easier to read?

It is true that experienced programmers make fewer errors than novices, but usually that is because they have made all the common errors at least once already.

graham.matthews@maths.anu.edu.au responded to Daniel Salomon

while(x < 10.0);

To my mind this is a *semantic* rather than syntactic flaw in C. Indefinite iteration in which you do nothing is semantically meaningless and the compiler should flag it as an error. People will say "what about coding an infinite loop that does ...". Answer: use a different semantic construct (eg. "forever do"). Indeed almost all the quirky bugs you can get in C through "syntax mistakes" occur because C's syntax does not adequately reflect the differences in the underlying semantics of various constructions, chosing instead to use the same syntax for different semantic ideas. And so you get into trouble.

the moderator added

Keep in mind that if x is volatile, that's a reasonable thing to write. It's a subtle issue.

Syntax Trees

Stefan Monnier (monnier@di.epfl.ch)

Wouldn't it be so much easier to store your source as a syntax-tree ?

the compiler doesn't have to care about parsing.
but somebody else (the editor, for instance) has, so 1 is moot.
you can edit your file with whatever indentation style you want: just ask the editor (the editor would do the unparsing)
you can even choose between different syntaxes. If you prefer indentation over block delimiters ... go ahead. The mapping from text to syntax tree would not be part of the language itself, but would be a little editor-customization !

I'm not saying writing an editor this way is the easiest thing in the world, but it is possible (there are examples close enough) and it looks like *the right way* to do things to me.

rabin@CS.YALE.EDU (Dan Rabin) replied

I am 100% in agreement with Stefan Monnier, and I would like to see this point of view on the proper relation of concrete-syntax design to programming-language design gain greater acceptance. Concrete syntax is essentially part of a programming language's user interface, not part of its functionality. Like all user-interface issues, this one deserves careful design and testing, but this is largely an independent issue from designing the semantics of the language and the capabilities of the compiler.

The usual gripes along the lines of `Lisp sucks--it has too many parentheses' seem to me to be roughly analogous to saying `Internal combustion engines suck--my Buick's steering wheel gets too hot in the summer'.

and bevan@cs.man.ac.uk (Stephen J Bevan) also responded to Stefan Monnier

Wouldn't it be so much easier to store your source as a syntax-tree ?

I theory I agree, but practice seems to be lagging a long way behind (examples to the contrary are welcome). One of the thorniest problems is what to do with comments i.e. what syntactical construct to you attach a comment to? If you have the luxury of working with a new language these issues can be taken into account in the language design, but with legacy code/languages things are tougher without resorting to style guides and their enforcement. Once you do that you'll find you've solved most of the problem without the need for syntax tree storage.

and the moderator added

Editing a syntax tree is a pain, too, since editing operations you do on source text rarely map cleanly to syntax changes.

programming languages that use indentation to show block structure

Occam

see another part of the discussion

Icon

see another part of the discussion

lutz@KaPRE.COM (Mark Lutz) mentioned Python

If you're interested in seeing an example of a language where blocks are based on indentation, and lines don't need explicit terminators, have a look at Python, a newer object-oriented scripting/extension language. For more information, see any of:
comp.lang.python
ftp
http

sys3bga@doc.ntu.ac.uk (Gaster) mentioned Haskell

The idea of using indentation to describe the layout or programs, has been used well in functional languages for a number of years. Further more the modern functional language Haskell allows the programmer to mix both { } and semi-colon's to describe layout or use what is know as the offside rule (originally proposed by Landin).

The offside rule allows the braces and semi-colons to be implicit instead of explicit, through the use of indentation. For more details (including a full specification of the offside rule used in Haskell) see the Haskell 1.2 report:


ARTICLE{hudak,
AUTHOR	= "P. Hudak and S. Peyton Jones and P. Walder (editors)",
TITLE	= "Report on the Programming Language Haskell. A Non-strict Purely
	Functional Language (Version 1.2)",
JOURNAL	= "ACM SIGPLAN Notices",
VOLUME	= 27,
NUMBER	= 5,
YEAR	= "May 1992"}