Eek! I got acked! abstract
Ann. Soc. Entomol. (N.S.), 2009, 45 (4), 511-528
Rafael E. Cárdenas, Jaime Buestán & Olivier Dangles.
Diversity and distribution models of horse flies (Diptera: Tabanidae) from Ecuador.
“To the following […] who assisted with compiling the extensive bibliography: […] Jonathan Rees, […]”
The word “module” is used in software engineering by analogy with the way it is applied to physical modules. We speak of modular housing, or modular container systems, or a plug-in module for an oscilloscope, and then turn around and put some bits on a disk or in RAM and call them a “module” too.
To be a module, something has to connect to something else in a general, intentional manner. That is, we have entities A and B that connect successfully, and they do so not because A has properties that make it compatible with B in particular or vice versa, but because they both have known, general properties that make them compatible with one another. In other words, you can change some properties of A or B, or swap A or B for things that have different properties, and they’ll remain compatible because of the compatibility-related properties that didn’t change. When you put together a modular home, the bathroom unit is a module because you might have connected a kitchen unit at that point instead.
In the software world we keep our software in files, we copy the files from one place to another, and we process them in various ways to get digital artifacts located either in files or in an address space (where the distinction between these two locations is a waning technical artifact). Now some artifacts you would call “modules” and some you wouldn’t. A software application that can be run on any copy of a particular software platform could be a module, because any copy of the application can run on any instantiation of the platform: Omnigraffle is a module because any copy of Omnigraffle is compatible with any Mac (within certain parameters such as OS version and so on). So my definition is pretty broad. (We could tighten this up if you like, e.g. to require that the thing you’re plugging it into has a specific place to attache it, not just a pile such as the Applications folder.)
But the idea does not apply to all software artifacts. A program that will only run on my computer – say, a script that touches particular files on my computer in a manner that’s difficult to reproduce – is not a module. After I link or load a file into a larger application, e.g. when I run Omnigraffle by copying it into an address space, it stops being a module, because after the linking or loading step it (the in-process version) is no longer compatible with any situation other than the one it’s in.
So when I read a programming language description that speaks of a “module” M, meaning a copy of some original module (from a file, say) that has been processed (linked, loaded) to such an extent that it is no longer compatible with any context except the one in which it is lodged, I bristle. A better word for such a thing would be “structure” (Standard ML, Scheme 48) – just about anything other than “module”. “Interface” works too, at least for place where you connect to the module-derived-thingy-structure. I can’t remember where I’ve seen the word “interface” used in this way – maybe Mesa.
The confusion arises, as usual, from metonymy. The module name may stand in for the structure name, and this is sort of OK if there is no ambiguity (only one structure per module). The danger with metonymy is the preemptive assumptions it makes: that there will always be only one structure per module, and that no one will ever get confused between the proper meaning and the metonymical one. These assumptions might be true in simple systems, but they become a severe limitation as systems integrate and scale through social space (changing developer community, redeployment) and through time (dynamic loading/linking, change management).
But why does one divide any system into identifiable pieces – e.g. why would one divide program source code into two files? So that you can change the two parts independently, i.e. vary properties of one without varying the other. Suppose you have a working program in files A and B, and you make a change to A and get a new working program. Then old A, new A, and B are all modules if we can convince ourselves that there is something general and intentional about the compatibility of the A’s to B. Because of the vagueness of the word “general”, this requirement could be easy to meet, especially when relaxed to some uninteresting condition that can be tested by machine, such as type compatibility or use/definition satisfaction. But to prove “intentional” you would have to write down what conditions make the files compatible with one another. This piece of writing is often called an “interface” (Java, Scheme 48), which relates metonymically metonymically to the definition of “interface” I gave above. “Interface specification” or “interface definition” would be a better term.
Exercise: What is a “component”, and what are its risky metonymical uses?
This is an open response to a recent presentation about module systems for Ecmascript, “The state of the load(): A brief progress report on current ES module implementations,” by Ihab Awad. I didn’t attend, but got the pointer from the e-lang list. I am not an Ecmascript programmer, but I think my point here applies to all formal notations.
I’m not too keen on imperative module systems, those based on constructs like ‘load’ that augment the running code base; they are alluring because of their simplicity, but I’ve seen a lot of benefit from a phased approach that permits computation of dependencies, and other ‘static’ analysis, separately from execution of ‘run time’ code, which often can only be run at ‘run time’.
(I put ‘static’ and ‘run time’ in scare quotes because this distinction is not very meaningful in a Web context where all activities are ‘dynamic’, ‘compilation’ happens at the craziest times, and static and dynamic, if anything, are just vague ideas around what kinds of activities happen when.)
The reason you care about phasing is that you do things to programs other than run them. There is of course the usual analysis of individual modules such as validity checking (syntax, variable binding, maybe some typing constraints), optimization, documentation and test case harvesting, and so on. But statements about a program’s module structure are also very useful to know in advance of execution. In the case of modules, early-phase knowledge of dependencies not only permits useful inter-module checks and optimizations, but also system-level operations such as prefetching (transitive closure of module dependencies), search, and packaging.
If ‘load’ is simply a function like any other then there is no systematic and predictable way to find all possible early-phase uses and distinguish them from the necessarily late-phase ones.
Just to name one case, the phase distinction is important for Scheme48, which has whole-system cross-building as a requirement, and of course is essential for any language that has a strong theory of static anything (especially around types and namespace control).
The main criticism of phased systems is the limitations imposed during the first phase. You often want to be able to compute (as opposed to just list) the early-phase information such as types and dependencies, create abstractions for use in writing the early phase information, and so on, and the conventional Modula-like approach won’t support this because the first-phase language is extremely limited. But you don’t have to go to an unphased design with no separation between early- and late-phase code in order to get a rich early-phase language. You can say that there are two programs, one that gets run early to provide the kind of information that early phases like to see (such as dependencies), and another that happens at whatever you consider to be ‘run time.’ Object-capability languages like E and Caja make articulating the abilities and limits of early-phase processing especially easy, of course; the language, API, or program designer gets to articulate exactly what the capabilities of the phase-I language are supposed to be (so to speak).
There are many ways to decide what stuff belongs to which phase. In Scheme48 we experimented with the idea that module metadata, especially dependencies, should be separated from the text of the module code. Many languages (Modula-II and many others, such as R6RS) make this separation through the syntax of file contents, with some special file-level-only syntactic constructs meant for module processing (you can’t be ‘implementation module …’ or ‘from x import y’ as one branch of an IF). Scheme48 went to the extreme of implementing the data/metadata separation as a file boundary. Common
Lisp does it both ways, with both a minimal in-file ‘provide/require’ and a full-featured meta-file ‘defsystem’. Yet another way is to allow early-phase information go anywhere but to distinguish it syntactically (I don’t have a good example, #include in C is not quite the right idea) so that it can be identified by a prepass. Sometimes one way is better, sometimes another is. In any case, how the distinction is made syntactically is much less important than that it be made somehow, without heuristics, so that phased processing is possible and reliable.
Late-phase loads will always be needed, and I’m not arguing that they should be forbidden (as if that were possible), but if you don’t provide notational encouragement for phase splitting, you are likely to frustrate the efforts of those who later on want to create tools for building and analyzing systems. There is a lot of experience behind us on how phasing can be exploited, so doing some design now is not a total shot in the dark. You could wait to do a phased design until the need is felt, but that will lead to the creation of an unphasable code base, and the price of splitting program text into data and metadata when the time comes that you need that distinction will be very high.