In one study a large amount of C code was scanned for lines without semicolons. It excluded things like
fun_call(a, b);to look only for expressions on more than one line. It seems that about a fourth of the lines found were errors. The missing semicolon was simply wrong about one fourth of the time. Clearly C would be better off with no semicolon and with a rule allowing continued lines. Perhaps new line ends a statement except within an open ( ), or [ ].
Further I find { } silly. Everyone in their right mind indents. Count indentation from the leftmost character of the previous line. Either eliminate tabs as valid within programs or give them some standard meaning like a tab takes you to the nearest 4's boundary. I would prefer eliminating from the language, this doesn't mean editors couldn't use tab as a command to put in spaces.
This means we could write
if x < y cout << "hello world\n" cin >> iInstead of
if (x < y) { cout << "hello world\n"; cin >> i; }The former is obviously a lot cleaner and eliminates a lot of style disputes. Further it eliminates the common error where {} are forgotten and the user forgets that indentation is not syntax. Lines and indentation are the most obvious parts of code, the compiler should see them too.
As well as starting more discussion about language design, this provoked a discussion about programming environments which looked at editors, Occam (a programming language that does indeed use indentation to show block structure), and the meaning of tabs, and another discussion about workplace practice which looked at automatically generated code and team work
The missing semicolon was simply wrong about one fourth of the time. Clearly C would be better off with no semicolon and with a rule allowing continued lines. Perhaps new line ends a statement except within an open (), or [].I think the real problem is that we're always trying to boil things down to a single mechanism for determining delimiters (e.g. newline vs. semicolon.) The right answer, in my mind, is to require both. Your study indicates that there really is no overhead for this (assuming people are already putting semicolons at the end of statements) and more errors would be caught. At the very least, the compiler/lint should issue warnings for such lines. A separate mechanism should be used to indicate that a statement spans multiple lines.
The right answer, in my mind, is to require both.Is it? How about:
C-like is ambiguous and the presence or absence of a semicolon can change the meaning of a program. Humans are error prone so not surprisingly a lot of mistakes occur.
One alternative is Wirth-like where the ambiguity has gone and a missing semicolon where one is needed trips an error, too many semicolons produce no error and no change in meaning. The semicolons are in that instance there to help the human read the source where the redundancy is to help us.
So I suggest the subject is danglies. You accept them, you also accept the penalty.
First of all, syntax errors are far too trivial to be of any relevance in
any discussion. It might have been a concern back in the old days when a
programmer had to (re)submit a program to a central site, but nowadays it's
an entirely trivial matter to pipe an entire error listing back into [an
editor to locate] and eliminate all the errors in one fell swoop and usually
doesn't take much more of a minute.
I have often been reduced to correcting errors one-by-one, because
the compiler got so lost after an error that I could not trust it. Also, the
nastiest syntax errors are those that give rise to a legal but wrong
program, and designing the language to make that less likely is well worth
while.
Second, experienced programmers simply don't make that many syntax errors.
Everyone has to learn each new language sometime, so why not make the
tedious part of that (learning the syntax) as simple as possible, so we can
concentrate on the important part (learning the semantics).
Third, I personally don't have any problems with the semicolon syntax in C and in fact find it accords quite well with natural usage style in human languages. Arguing against punctuation for delimiters in a programming language (especially when the alternative is nested bracketing of some kind which is inimical to natural language and natural style (apparently except German (which probably explains why the Pascaloids and Moduloids seem to have Teutonic heritage and affinity))) is a lot like arguing against the use of periods at the end of sentences.
Fourth, as per the comment that nobody in their right mind uses {...} without indentation: I'm in my right mind, but I don't always indent. I use a VERY great multiplicity of stylistic forms, often based on semantic, layout and context considerations, including the following:
if (A) S if (A) S, S, ..., S if (A) { S; S; S; S; ...; S; } if (A) S, S, ..., S, S, S, ..., S, ... S, S, ..., S if (A) { S; S; ..., S; S; S; ..., S; ... S; S; ..., S; } if (A) while (B) ... if (A) while (B) ... if (A) while (B) ... for (...) for (...) for (...) for (...)and forcing layout for the compiler's sake is going to destroy all of this.
You may argue that this obfuscates code. No quite the opposite. The use of expressivity to reflect intention not only clarifies matters but, to prove the point, I can literally go back into code I haven't written in years (and have) and immediately see what the intended meaning was *even without comments*. Of course, part of that also has to do with the naming conventions I use too. This ability is the one determinant and main function of proper style.
This is an excellent point. I use alternative layouts all over the
place to distinguish between important and trivial constructs, and
combinations like:
for/while (...) if (...) {
and
... {induction variables=...
Of course, Mark does say that HE can go back into his code; it would be
interesting to see how this sort of thing could be systematised for use by
teams, or where maintenance may be done by a different programmer.
First of all, syntax errors are far too trivial to be of any relevance[...]The discussion was more about syntax than about syntax errors. Poor syntax can lead to errors that persist until run time. For instance in the C language, a programmer may become used to ending every line with a semicolon may absentmindedly write:
while(x < 10.0);and not notice the error. There is no syntax error here, but this statement is almost certainly a run-time error.
Other classic examples of error prone syntax in C are:
It is true that experienced programmers make fewer errors than novices, but usually that is because they have made all the common errors at least once already.
To my mind this is a *semantic* rather than syntactic flaw in C. Indefinite iteration in which you do nothing is semantically meaningless and the compiler should flag it as an error. People will say "what about coding an infinite loop that does ...". Answer: use a different semantic construct (eg. "forever do"). Indeed almost all the quirky bugs you can get in C through "syntax mistakes" occur because C's syntax does not adequately reflect the differences in the underlying semantics of various constructions, chosing instead to use the same syntax for different semantic ideas. And so you get into trouble.while(x < 10.0);
Wouldn't it be so much easier to store your source as a syntax-tree ?
The usual gripes along the lines of `Lisp sucks--it has too many parentheses' seem to me to be roughly analogous to saying `Internal combustion engines suck--my Buick's steering wheel gets too hot in the summer'.
Wouldn't it be so much easier to store your source as a syntax-tree ?I theory I agree, but practice seems to be lagging a long way behind (examples to the contrary are welcome). One of the thorniest problems is what to do with comments i.e. what syntactical construct to you attach a comment to? If you have the luxury of working with a new language these issues can be taken into account in the language design, but with legacy code/languages things are tougher without resorting to style guides and their enforcement. Once you do that you'll find you've solved most of the problem without the need for syntax tree storage.
The offside rule allows the braces and semi-colons to be implicit instead of explicit, through the use of indentation. For more details (including a full specification of the offside rule used in Haskell) see the Haskell 1.2 report:
ARTICLE{hudak, AUTHOR = "P. Hudak and S. Peyton Jones and P. Walder (editors)", TITLE = "Report on the Programming Language Haskell. A Non-strict Purely Functional Language (Version 1.2)", JOURNAL = "ACM SIGPLAN Notices", VOLUME = 27, NUMBER = 5, YEAR = "May 1992"}