Definition of a scripting language
Here is an interesting and relevant discussion about programming language
design, copied from the newsgroup "comp.compilers", starting in March 1995
and still going in July!
The discussion is interesting for various reasons:
- It looks at aspects of language design that you should recognise from
having used C and a shell.
- It shows how there can be more than one right answer to language design,
depending on what you are trying to achieve. The quest for a general purpose
super-language, that can be used in all circumstances, has not succeeded yet.
Although change for its own sake is usually a bad idea, sticking to the
familiar without at least understanding and considering the alternatives is
not the mark of a good Computer Scientist.
- It indicates how some "obvious" and "important" differences between
languages seem insignificance compared to the whole range of language
paradigms and the different ways in which they have been made concrete. The
difference between a programming language (C) and a scripting language (SH)
is important and interesting, but is tiny compared to the difference between
imperative (C) and functional (SML) programming.
Note how many of the responses need clarification on closer examination,
as other participants bring different experiences to the discussion.
My comments and explanations are in italics; I have edited the
messages to remove some simple spelling and grammar mistakes; [...]
indicates text removed or replaced (by me or others) to focus on language
design. Indented text is a quote from a previous message.
Ariel Lieberman (ariel@ccs.neu.edu) asked
I'm looking for a definition of a scripting language (Perl, REXX, Tcl, etc.).
What are the differences between a scripting language and a "regular"
programming language (C, C++, Ada, Smalltalk, etc.).
And the moderator replied
A scripting language is one in which people think they're writing scripts,
and a programming language is one in which they think they're writing
programs. You can come up with rules of thumb, e.g. scripting languages tend
to be interpreted, but it's easy to find exceptions to any of them.
This is essentially true, and I expected the discussion to die out.
However, as the next post showed, it is interesting and useful to try to
come up with some rules, even if they can only be guidelines rather than a
strict classification, and even though there are still gaping holes in this
"definition". In particular, it bases its concept of a scripting language on
the concept of a command language, although they are also essentially
programming languages; when does a CLI become a shell, and when does a shell
become a scripting language?
Stavros Macrakis (macrakis@osf.org) replied
Scripting languages support doing programmatically what otherwise is done
directly by the user through direct commands (whether textual or graphical).
Several characteristics follow from this definition.
- Scripting languages often follow the syntax and semantics of command
languages. For instance, many scripting languages do not require quoting of
string literals, but rather require explicit evaluation of variables (x
denotes the string "x", but $x denotes the value of the variable x).
- Scripting languages make it easy to call system commands, prepare their
arguments, and manipulate their results. They generally have some built-in
primitives for manipulating file and directory names, argument lists,
environment variables, etc.
- Scripting languages generally are good at handling strings, and don't
emphasize numerical manipulation.
- Since calling system commands is generally much more expensive than script
execution itself, there is little emphasis on run-time efficiency, therefore
they are often implemented using interpreters, bytecode interpreters, or
macro processors.
Some command languages (e.g. Unix sh or csh command languages) are also
useful as scripting languages. Some scripting languages can be used instead
of a command shell. Some programming languages can easily be used as
scripting languages (with appropriate libraries).
In my opinion, it is a bad idea to distinguish sharply between the design
of scripting languages and the design of other programming languages. What
starts out as a small script may develop over time into a very sophisticated
program. Users may wish to share independently-maintained modules between
scripts -- hence a need for namespace control and clear interface
definition. Error handling may become necessary -- hence a need for
exception handling. Cooperation among asynchronous processes may be
necessary -- hence a need for threads or continuations and their related
synchronization mechanisms. In short, many of the same problems that face
programmers in general face script programmers in particular.
and the moderator added
I suppose one might draw the line by saying that a scripting language is one
where the main effect of a program is to drive another system, while in a
programming language the program itself is the main action.
to which eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig) responded
pointing out that this line is just as illusory
Of course, "traditional" programs are nothing more than scripts for a
microprocessor. :-)
Ultimately, these sorts of distinctions aren't very helpful; at best they
can be descriptive, not prescriptive.
Personally, I view a language as a "scripting" language if
- whitespace is significant, and
- there is an anti-quotation facility to turn strings into programs.
Both are horrid features, and both are amply found in sh, perl, and
Tk. :-)
That drew a response from the designer of Perl, a complex (in the best
sense of the word) scripting language, who pointed out ambiguities in
Jonathan's definition
lwall@netlabs.com (Larry Wall) responded
1. whitespace is significant
According to this criterion, Fortran was the last great scripting language. :-)
e.g. (based on punched cards) columns 1 to 6 and 73 to 80 have special
meanings and newlines usually separate statements in (pre 1990) Fortran.
However, spaces in the middle of Fortran identifiers are ignored.
Whitespace is significant in nearly all modern languages. You can't put
whitespace inside a token in C, Ada, Modula, Lisp, Perl, etc. And you
typically *must* put whitespace between two adjacent alphanumeric tokens in
any of these languages.
Now, if what you meant is that newlines have syntactic meaning in addition
to their ordinary whitespace meaning, I'll grant you that many languages do
this, including sh, awk, Tcl and Python. But not Perl. Perl doesn't even
care if you put a newline inside a literal string. Please tar Perl with a
different brush.
2. there is an anti-quotation facility to turn strings into programs.
By this definition all computer languages are scripts. Of course, some
"anti-quotation facilities" are clunkier than others...
e.g. if we are desperate enough, we can output some text, compile and
run it.
[...] The question of "scripts" vs "programs" comes up frequently in the
comp.lang.perl newsgroup.
Speaking as a computer scientist, my answer is: These are not (yet)
technical terms.
Speaking as a linguist, my answer is: These words (like most words) are
defined by prototype, not by boundary. A script is what you give the actors,
and a program is what you give the audience.
eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig) then clarified the first point
from his previous post
Perhaps I should have been more clear: what I meant was that in most
scripting languages, sequences of "whitespace" characters (which normally
include SPACE, TAB, and NEWLINE) are not equivalent. Which means, or course,
that such "whitespace" isn't really "white" at all, since there is semantic
meaning attached to the various forms of "whitespace" in the program, other
than the usual use of tokenization.
C doesn't escape such cruftiness either, unfortunately: "\<NEWLINE>" is
not the same as "\<SPACE><NEWLINE>", and the fact that preprocessor
directives must start in column 1 means that the newline character isn't
really "white" at all. An unfortunate legacy of the preprocessor.
and ludemann@netcom.com (Peter Ludemann) responded
I think that ANSI-C allows white space before "#" for preprocessor
directives. Anyway, as far as I know, there's no reason to have
"\<SPACE><NEWLINE>" as different from "\<NEWLINE>" ... in fact, the
C compilers for the IBM mainframe and AS/400 treat these as the same (they
have to: many source files have fixed-length records).
Another example of "white space" not being white is in "make": the commands
have to be preceded by a tab, which is not equivalent to spaces. Again,
there's no good reason for this, except perhaps maintaining some strange
kind of compatibility with the early implementations.
The discussion now rapidly increased in volume and changed emphasis,
focussing mainly on the two main uses of white space other than as a word
separator: