Our system of interface files is quite complex. Some of the complexity is justified, or at least was justified at the time. Other aspects are almost certainly accidents. The compiler's old representation of parse trees as raw lists of items, with any structure and constraints being implicit rather than explicit, was quite error prone; since the structure and constraints were not expressed in types, violations did not result in type errors, and thus could accumulate undetected.
I (zs) don't believe any design document for this system ever existed outside of Fergus's head. This document is my (zs's) attempt to reconstruct that design document. In the rest of this file, I will try to be careful to explicitly distinguish between what I know to be true, and what I only believe to be true, either because I remember it, or because I deduce it from the code. Note also that I may be mistaken.
The principle of information hiding dictates that only some of the contents of a module should be visible outside the module. The part that is visible outside the module is usually called the interface, while the part that is not visible outside the module is usually called the implementation.
When compiling a module A that imports functionality from module B, the compiler usually wants to read a file containing only the interface of module B. In some languages such as Ada, C and C++, programmers themselves write this file. Having to maintain two files for one module can be a bit of a hassle, so in other languages, such as Haskell, programmers only ever edit one file for each module (its source file). Within that file, they indicate which parts are public and which are not, and the compiler uses this information to generate each module's interface file automatically.
In Mercury, the source code of each module is in a single file, whose suffix is .m, and the generation of interface files is solely the job of the compiler. However, unlike most other similar programming languages, the compiler generates three or four different interface files for each source file, and it generates these in two or three steps.
The steps are as follows.
The different kinds of interface files are as follows.
They were originally intended to record just the names of the types, insts and modes are defined in the module, to allow references to these names to be disambiguated (i.e. to be fully module qualified) when creating the other kinds of interface files for other modules.
However, we were forced to include information of the form "type A is defined to be equivalent to type B" in .int3 files, because on 32 bit architectures, if type B is "float", then modules that work with values of type A need to know to reserve two words for them, not one.
I believe the intention was that A.int0 play the same role for these exported-only-to-submodules parts of A as some other module's .int3 file plays for its exported-to-everyone parts. I believe the A.int0 file should be read only when processing A's submodules: either creating their .int/.int2 files, or generating target code for them.
The .int file plays the traditional role of the interface file; it is an automatically generated analogue of a C header file or Ada package specification. As such, it contains everything in the module's interface section(s), plus some other information from the implementation section that the compiler has found it needed over the years.
The compiler generates .int2 files from .int files by filtering out some items. The original filtering algorithm was the same as the one we applied to the original module to generate .int3 files. I believe the intention was to make each .int2 file a fully module qualified version of the corresponding .int3 file. However, since the differences between the two starting points (the unqualified whole module for .int3 files and its fully module qualified .int file) are not restricted to differences in qualification, and the two filtering algorithms have also diverged over time, so the differences between a module's .int2 and .int3 files are not be restricted to just qualification.
Something to keep in mind: while the --make-short-interface compiler option calls for the creation of .int3 files, several predicate and variable names inside the compiler use the term "short interface files" to refer to .int2 files. While sort-of justifiable, in that .int2 files are in fact shorter versions of .int files, it can nevertheless be extremely confusing.
The contents of the .int3 file of a module are derived solely from the contents of the interface sections of that module. Each item in these sections
If the item is included in the .int3 file, whether changed or unchanged, it stays in the interface section, so .int3 files contain no implementation section.
After we decide what items to include in the .int3 file, its contents are module qualified to the maximum extent possible. However, this cannot guarantee full module qualification, for reasons explained below.
The rules for choosing between the above three outcomes of course depend on the item's type.
The overall effect of these rules is that for each type definition, we convey two pieces of information to the reader of the .int3 file.
The first piece of information is just the name of the type, which the readers need for disambiguation. Every name in a .int file must be fully module qualified, so if the interface section of a module refers to type t1, the code that creates the module's .int file needs to know which of the imported modules defines a type named t1. It gets this information from the .int3 files of the modules listed in import_module declarations.
The second piece of information is whether the representation of the given type differs from the standard, and if so, how. XXX The compiler can now put this info into type representation items that are separate from the type definition items. For now, it actually does so only if the value of the --experiment option is set to a special value.
At the moment, such possibly unqualified names may appear in
If the interface contains any of these kinds of items, then none of the interface's import_module declarations will be included in the .int3 file. If it does, then all of the interface's import_module declarations will be included in the .int3 file.
XXX While type definition items will never contain references to type constructor names that may require qualification, type representation items for equivalence types may.
XXX If e.g. the constructor t1/0 appears in an abstract instance definition, and the module defines a type constructor t1/0, this arguably should not require us to include all the imports in the .int3 file. The reason has two parts. Either the t1/0 is defined in one or more of the imported modules, or it isn't. If it is not defined in any of them, then causing the readers of this module's .int3 file to read those other modules' .int3 files will just compute the same result, only slower. On the other hand, if some of those modules do define t1/0, then this type constructor is multiply defined, so its appearance in the instance definition is ambiguous. This fact will be reported in an error message when this module is compiled to target code. Reporting it during some other compiler invocation that happens to read this module's .int3 file may be more of a distraction than a help.
XXX We should investigate replacing the copied import_module declaration list with a single item that says "these type, inst, etc names in this .int3 file may be defined in these other modules". This should have two benefits. One, it would allow us to stop the transitive grabbing of .int3 files as soon as we have read the .int3 files that define all the type, inst etc names that are still outstanding. Two, it should allow us to start using the transitively grabbed files only for their intended purpose, which should stop "leaks" of declarations/definitions from these transitively-included-only-for-module-qualification modules to the HLDS of the module being compiled. Right now, it is possible, though rare, to delete an import of module b from module a, and discover that this results in an error: a type being undefined, when that type is defined in module c. This happens only because the import of b dragged with it the definitions inside c as well, so that the import of c was required by the language definition but not by the compiler.
The contents of the .int0 file of a module are derived from the contents of both the interface sections and the implementation sections of that module, after they are fully module qualified. This requires reading in, via grab_unqual_imported_modules_make_int,
Items are never moved between sections: items in the interface section of the module are put into the interface section of the .int0 file, while items in the implementation section of the module are put into the implementation section of the .int0 file.
The contents of the .int file of a module are derived from the contents of both the interface sections and the implementation sections of that module, after they are fully module qualified. This requires reading in, via grab_unqual_imported_modules_make_int,
Items are never moved between sections: items in the interface section of the module are put into the interface section of the .int file, while items in the implementation section of the module are put into the implementation section of the .int file.
Type definitions in the implementation section are transformed in two stages: before and after qualification.
In the first, pre-qualification stage, we transform such all type definitions as follows:
XXX This is done by make_canon_make_du_and_solver_types_abstract, which is relevant for the .int2 section below.
In the second, post-qualification stage, we keep only a subset of the type definitions in the implementation section, and even the ones we keep, we transform further. XXX Actually, we transform them further only if they have only one definition in the implementation section. If they have two or more, we leave them alone. This is almost certainly a bug.
We keep a type definition in the implementation section if the type constructor being defined satisfies any of the following conditions:
Note that a type constructor may have more than one definition either within the rules of Mercury (e.g. an abstract and a non-abstract definition, or a Mercury definition and some foreign definitions), or violating the rules of Mercury, as in the case of buggy code.
The compiler also puts foreign_import_module items into .int files automatically, without explicit action by the programmer. These implicit foreign_import_module items always import the current module.
We put such an implicit self-import into the interface of the .int file for a given target language if the .int file interface also contains either a foreign type definition or a foreign enum pragma for that language. Likewise, we put such an implicit self-import into the implementation of the .int file for a given target language if the .int file interprocess also contains either a foreign type definition or a foreign enum pragma for that language.
There is one exception from the above. If the interface section of the .int file contains a foreign_import_module for a given module in a given target language, then the compiler won't include a copy of the same foreign_import_module item in the implementation section, either explicitly or implicitly.
We put a use_module declaration into the implementation section of the .int file for all of the modules that have an import_module and/or use_module declaration in the module's implementation section, provided
The reason why we need to use_module declarations for these modules is that the compiler creates entries for e.g. the types we are imported from them only when it sees them defined, not when it sees them used. XXX XXX We should consider fixing that.
The reason why we can turn what were originally import_module declarations into use_module declarations is that we generate the .int file at all only if we can successfully module qualify everything in it (actually, everything in the augmented compilation unit from which we derive the .int file's contents). XXX This is not quite true. A compiler invocation that cannot fully module qualify the augmented compilation unit when trying to generate an interface file will exit with a nonzero exit status, so technically it fails, but it will still generate the interface file. XXX XXX We should consider fixing that. In fact, if we cannot generate an interface file correctly, we should delete the file we would have overwritten (if it exists) as well as its timestamp file (again, if it exists).
The contents of the .int2 file of a module are derived from the contents of the module's .int file. This means that indirectly, it is derived from both the interface sections and the implementation sections of that module, after they are fully module qualified, which requires reading in various interface files of various other modules (see the section on .int files above for details).
We keep a type definition in the implementation section if the type constructor being defined satisfies any of the following conditions: XXX this seems wrong
XXX It should eventually be possible to make abstract any inst and mode definitions that do not refer to any other modules (other than the public or private builtin modules, though the private one does not define any insts or modes). This is because the only thing that code using such an inst or mode can do with it is pass it to a predicate or function defined in the same module. However, until recently the compiler did not handle abstract insts and modes at all, and even now, the only thing they can be used for reliably is module qualification; pretty much all operations on abstract insts and modes will cause a compiler abort.
We copy explicit an foreign_import_module item from the interface or the implementation section of the source file to the implementation section of the .int2 file if we include in the implementation section of the .int2 file a foreign type definition for the target language mentioned in that foreign_import_module item. Likewise, we put an implicit foreign_import_module item for the current module into the implementation section of the .int2 file for every target language mentioned in a foreign_type definition in the implementation section of the .int2 file.
As with .int files, there is one exception from the above. If the interface section of the .int2 file contains a foreign_import_module for a given module in a given target language, then the compiler won't include a copy of the same foreign_import_module item in the implementation section, either explicitly or implicitly.
As mentioned above in .int file section, the reason why we need use_module declarations for these modules is that the compiler creates entries for e.g. the types we are importing from them only when it sees them defined, not when it sees them used.
And as was also mentioned above in .int file section, the reason why we can turn what were originally import_module declarations into use_module declarations is that we generate the .int2 file at all only if we can successfully module qualify everything in it (actually, everything in the augmented compilation unit from which we derive the .int2 file's contents).
A module's .int, .int2 and .int3 files are supposed to maintain the property that
This is implicit in the fact that the algorithm we use to read in interface files (in grab_modules.m) never reads in a .int2 file if it has read in that module's .int file, and never reads in a .int3 file if it has read in that module's .int or .int2 file. In fact, making this possible is the reason why it reads in .int files first, .int2 files next, and .int3 files last.
(Note that in the presence of intermodule optimization, grab_modules.m can read in a module's .int file (as an int-for-opt file) after it reads in its .int2 or .int3 file. This will typically lead to many entities defined in the .int2 or .int3 file being defined twice. The compiler must (and does) reject such double definitions silently.)
I (zs) don't see any inclusion requirements being placed on .int0 files. However, grab_modules.m does have to read in its ancestors'.int0 files first, because the set of .int files it needs to read includes not just the .int files of the modules imported by the current module, but also the .int files of the modules imported by its ancestors.
Note that the inclusion property does not apply to import_module and use_module declarations. While .int and .int2 files contain only use_module declarations, .int3 files contain only import_module declarations, which can be considered more expressive, and .int3 files may contain import_module declarations for modules for which the corresponding .int and/or .int2 files do not contain use_module declarations. This is because in the absence of errors, .int and .int2 files are always fully module qualified, which is something we cannot insist on for .int3 files. This difference is
Consumers of .int3 files need import_module declarations because they need to find which module defines an entity, such as a type constructor; consumers of .int and .int2 files need only use_module declarations because they need only to look up the definition of that entity.
That definition should be needed only in two cases. First, in the case of type definitions, we may need to follow chains of type equivalences to the end in order to make decisions about type representations involving that type. Second, in the case of inst and mode definitions, we may likewise need to follow chains of equivalences to the end in order to figure out the exact expansions of named insts and modes, which we may need for mode analysis.