This is an open response to a recent presentation about module systems for Ecmascript, “The state of the load(): A brief progress report on current ES module implementations,” by Ihab Awad. I didn’t attend, but got the pointer from the e-lang list. I am not an Ecmascript programmer, but I think my point here applies to all formal notations.
I’m not too keen on imperative module systems, those based on constructs like ‘load’ that augment the running code base; they are alluring because of their simplicity, but I’ve seen a lot of benefit from a phased approach that permits computation of dependencies, and other ‘static’ analysis, separately from execution of ‘run time’ code, which often can only be run at ‘run time’.
(I put ‘static’ and ‘run time’ in scare quotes because this distinction is not very meaningful in a Web context where all activities are ‘dynamic’, ‘compilation’ happens at the craziest times, and static and dynamic, if anything, are just vague ideas around what kinds of activities happen when.)
The reason you care about phasing is that you do things to programs other than run them. There is of course the usual analysis of individual modules such as validity checking (syntax, variable binding, maybe some typing constraints), optimization, documentation and test case harvesting, and so on. But statements about a program’s module structure are also very useful to know in advance of execution. In the case of modules, early-phase knowledge of dependencies not only permits useful inter-module checks and optimizations, but also system-level operations such as prefetching (transitive closure of module dependencies), search, and packaging.
If ‘load’ is simply a function like any other then there is no systematic and predictable way to find all possible early-phase uses and distinguish them from the necessarily late-phase ones.
Just to name one case, the phase distinction is important for Scheme48, which has whole-system cross-building as a requirement, and of course is essential for any language that has a strong theory of static anything (especially around types and namespace control).
The main criticism of phased systems is the limitations imposed during the first phase. You often want to be able to compute (as opposed to just list) the early-phase information such as types and dependencies, create abstractions for use in writing the early phase information, and so on, and the conventional Modula-like approach won’t support this because the first-phase language is extremely limited. But you don’t have to go to an unphased design with no separation between early- and late-phase code in order to get a rich early-phase language. You can say that there are two programs, one that gets run early to provide the kind of information that early phases like to see (such as dependencies), and another that happens at whatever you consider to be ‘run time.’ Object-capability languages like E and Caja make articulating the abilities and limits of early-phase processing especially easy, of course; the language, API, or program designer gets to articulate exactly what the capabilities of the phase-I language are supposed to be (so to speak).
There are many ways to decide what stuff belongs to which phase. In Scheme48 we experimented with the idea that module metadata, especially dependencies, should be separated from the text of the module code. Many languages (Modula-II and many others, such as R6RS) make this separation through the syntax of file contents, with some special file-level-only syntactic constructs meant for module processing (you can’t be ‘implementation module …’ or ‘from x import y’ as one branch of an IF). Scheme48 went to the extreme of implementing the data/metadata separation as a file boundary. Common
Lisp does it both ways, with both a minimal in-file ‘provide/require’ and a full-featured meta-file ‘defsystem’. Yet another way is to allow early-phase information go anywhere but to distinguish it syntactically (I don’t have a good example, #include in C is not quite the right idea) so that it can be identified by a prepass. Sometimes one way is better, sometimes another is. In any case, how the distinction is made syntactically is much less important than that it be made somehow, without heuristics, so that phased processing is possible and reliable.
Late-phase loads will always be needed, and I’m not arguing that they should be forbidden (as if that were possible), but if you don’t provide notational encouragement for phase splitting, you are likely to frustrate the efforts of those who later on want to create tools for building and analyzing systems. There is a lot of experience behind us on how phasing can be exploited, so doing some design now is not a total shot in the dark. You could wait to do a phased design until the need is felt, but that will lead to the creation of an unphasable code base, and the price of splitting program text into data and metadata when the time comes that you need that distinction will be very high.