Definition of a scripting language

Here is an interesting and relevant discussion about programming language design, copied from the newsgroup "comp.compilers", starting in March 1995 and still going in July!

The discussion is interesting for various reasons:

It looks at aspects of language design that you should recognise from having used C and a shell.
It shows how there can be more than one right answer to language design, depending on what you are trying to achieve. The quest for a general purpose super-language, that can be used in all circumstances, has not succeeded yet.
Although change for its own sake is usually a bad idea, sticking to the familiar without at least understanding and considering the alternatives is not the mark of a good Computer Scientist.
It indicates how some "obvious" and "important" differences between languages seem insignificance compared to the whole range of language paradigms and the different ways in which they have been made concrete. The difference between a programming language (C) and a scripting language (SH) is important and interesting, but is tiny compared to the difference between imperative (C) and functional (SML) programming.
Note how many of the responses need clarification on closer examination, as other participants bring different experiences to the discussion.

My comments and explanations are in italics; I have edited the messages to remove some simple spelling and grammar mistakes; [...] indicates text removed or replaced (by me or others) to focus on language design. Indented text is a quote from a previous message.

Ariel Lieberman (ariel@ccs.neu.edu) asked

I'm looking for a definition of a scripting language (Perl, REXX, Tcl, etc.). What are the differences between a scripting language and a "regular" programming language (C, C++, Ada, Smalltalk, etc.).

And the moderator replied

A scripting language is one in which people think they're writing scripts, and a programming language is one in which they think they're writing programs. You can come up with rules of thumb, e.g. scripting languages tend to be interpreted, but it's easy to find exceptions to any of them.

This is essentially true, and I expected the discussion to die out. However, as the next post showed, it is interesting and useful to try to come up with some rules, even if they can only be guidelines rather than a strict classification, and even though there are still gaping holes in this "definition". In particular, it bases its concept of a scripting language on the concept of a command language, although they are also essentially programming languages; when does a CLI become a shell, and when does a shell become a scripting language?

Stavros Macrakis (macrakis@osf.org) replied

Scripting languages support doing programmatically what otherwise is done directly by the user through direct commands (whether textual or graphical).

Several characteristics follow from this definition.

Scripting languages often follow the syntax and semantics of command languages. For instance, many scripting languages do not require quoting of string literals, but rather require explicit evaluation of variables (x denotes the string "x", but $x denotes the value of the variable x).
Scripting languages make it easy to call system commands, prepare their arguments, and manipulate their results. They generally have some built-in primitives for manipulating file and directory names, argument lists, environment variables, etc.
Scripting languages generally are good at handling strings, and don't emphasize numerical manipulation.
Since calling system commands is generally much more expensive than script execution itself, there is little emphasis on run-time efficiency, therefore they are often implemented using interpreters, bytecode interpreters, or macro processors.

Some command languages (e.g. Unix sh or csh command languages) are also useful as scripting languages. Some scripting languages can be used instead of a command shell. Some programming languages can easily be used as scripting languages (with appropriate libraries).

In my opinion, it is a bad idea to distinguish sharply between the design of scripting languages and the design of other programming languages. What starts out as a small script may develop over time into a very sophisticated program. Users may wish to share independently-maintained modules between scripts -- hence a need for namespace control and clear interface definition. Error handling may become necessary -- hence a need for exception handling. Cooperation among asynchronous processes may be necessary -- hence a need for threads or continuations and their related synchronization mechanisms. In short, many of the same problems that face programmers in general face script programmers in particular.

and the moderator added

I suppose one might draw the line by saying that a scripting language is one where the main effect of a program is to drive another system, while in a programming language the program itself is the main action.

to which eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig) responded

pointing out that this line is just as illusory

Of course, "traditional" programs are nothing more than scripts for a microprocessor. :-)
Ultimately, these sorts of distinctions aren't very helpful; at best they can be descriptive, not prescriptive.

Personally, I view a language as a "scripting" language if

whitespace is significant, and
there is an anti-quotation facility to turn strings into programs.

Both are horrid features, and both are amply found in sh, perl, and Tk. :-)

That drew a response from the designer of Perl, a complex (in the best sense of the word) scripting language, who pointed out ambiguities in Jonathan's definition

lwall@netlabs.com (Larry Wall) responded

1. whitespace is significant

According to this criterion, Fortran was the last great scripting language. :-)

e.g. (based on punched cards) columns 1 to 6 and 73 to 80 have special meanings and newlines usually separate statements in (pre 1990) Fortran. However, spaces in the middle of Fortran identifiers are ignored.

Whitespace is significant in nearly all modern languages. You can't put whitespace inside a token in C, Ada, Modula, Lisp, Perl, etc. And you typically *must* put whitespace between two adjacent alphanumeric tokens in any of these languages.

Now, if what you meant is that newlines have syntactic meaning in addition to their ordinary whitespace meaning, I'll grant you that many languages do this, including sh, awk, Tcl and Python. But not Perl. Perl doesn't even care if you put a newline inside a literal string. Please tar Perl with a different brush.

2. there is an anti-quotation facility to turn strings into programs.

By this definition all computer languages are scripts. Of course, some "anti-quotation facilities" are clunkier than others...

e.g. if we are desperate enough, we can output some text, compile and run it.

[...] The question of "scripts" vs "programs" comes up frequently in the comp.lang.perl newsgroup.
Speaking as a computer scientist, my answer is: These are not (yet) technical terms.
Speaking as a linguist, my answer is: These words (like most words) are defined by prototype, not by boundary. A script is what you give the actors, and a program is what you give the audience.

eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig) then clarified the first point from his previous post

Perhaps I should have been more clear: what I meant was that in most scripting languages, sequences of "whitespace" characters (which normally include SPACE, TAB, and NEWLINE) are not equivalent. Which means, or course, that such "whitespace" isn't really "white" at all, since there is semantic meaning attached to the various forms of "whitespace" in the program, other than the usual use of tokenization.

C doesn't escape such cruftiness either, unfortunately: "\<NEWLINE>" is not the same as "\<SPACE><NEWLINE>", and the fact that preprocessor directives must start in column 1 means that the newline character isn't really "white" at all. An unfortunate legacy of the preprocessor.

and ludemann@netcom.com (Peter Ludemann) responded

I think that ANSI-C allows white space before "#" for preprocessor directives. Anyway, as far as I know, there's no reason to have "\<SPACE><NEWLINE>" as different from "\<NEWLINE>" ... in fact, the C compilers for the IBM mainframe and AS/400 treat these as the same (they have to: many source files have fixed-length records).

Another example of "white space" not being white is in "make": the commands have to be preceded by a tab, which is not equivalent to spaces. Again, there's no good reason for this, except perhaps maintaining some strange kind of compatibility with the early implementations.

The discussion now rapidly increased in volume and changed emphasis, focussing mainly on the two main uses of white space other than as a word separator:

newline as a statement terminator/separator, instead of e.g. semicolon
indentation (spaces and/or tabs) to show block structure, instead of e.g. {...} or BEGIN...END