05.07.06

Noodling about a MetaLanguage

Posted in Computer Languages at 12:09 pm by Brooks

I’ve been reading a few articles recently about language syntax and semantics, and what makes for a good computer language these days. (For instance, this article by Larry O’Brien.) One theme that struck me was the idea that the power of a language is in its extensibility — that is, in the rules of the language that allow a program to add to the language.

This has interesting ramifications for language design. A language is not defined by all of the syntax and semantics that are in its programs. It is, instead, defined by its primitives and by the structure upon which the programmer can add to the language.

For example, here’s a short Fortran program:

program sample
  use vectorModule
  real :: b
  type(vector) :: V
  call loadFromFile(V, ‘myfile.dat’)
  b = abs(V)
  write(*,*) b
end program

Fortran defines the words that are highlighted in red, and the punctuation; everything else is something that I’ve defined either in this program or in the matrixModule module that gets included by the second line. Some of these definitions are so common that we don’t even think of them as “language extensions” any more — for instance, the function and subroutine definitions. Beyond those, there’s also the user-defined “vector” type, and the extension of the absolute-value intrinsic to apply to it.

The invariants of the language are more interesting, though. Variables are declared with a “::” syntax, and must be declared before any executable statement. User-defined variables must be declared as “type(vector) ::” rather than just “vector ::”. Subroutines cannot be used as statements alone; they are prefixed with a “call” keyword. Expressions, likewise, cannot be used as statements — and an assignment is a statement, not an expression. These requirements are at the core of what can and what cannot be a Fortran program.

One interesting question is: What happens as we reduce the number of syntax invariants of the language?

Consider the TeX typesetting language. What if we had a programming language that worked like TeX?

TeX is, to a rough approximation, a two-level language. The first layer is a macro expansion language; the user’s code (along with some system libraries) is processed through a macro system that converts it into a stream of TeX primitive commands. These primitive commands are then processed through a second layer, which converts them into the typesetting equivalent of machine code. It would be easy enough to imagine a language like this where the second layer produces compiled programs rather than typeset pages.

This has some interesting results in how the TeX is used. Knuth expected that what he had written was largely a proof-of-concept language, and that people would rewrite the basic language processor to add extensions to it. However, the macro expansion language he wrote is sufficiently powerful that nearly any desired language extension can be written as a macro. (There are a few exceptions, surrounding things like file input-output and bits of math typesetting that are implemented largely in the second layer, but they are rare.) And, given the choice of writing a macro that works with the existing system, or creating a new system, people have generally chosen to write macros. Even the LaTeX typesetting langauge, which is sufficiently different from TeX as to be nearly a distinct language, is merely a large set of TeX macros that are treated as a system library.

Another benefit is that it makes experimentation easy, and makes it portable. To write a new bit of language syntax, I simply write a few macro commands and put them at the top of my file — and, so long as those commands go with my file, it will run on anyone else’s TeX system.

These would be good traits to have in a programming language.

Thus, the question is: What is an appropriate set of primitives for a programming language? And what is a good way to structure a TeX-like macro language that will produce them?

Leave a Comment