Type inference

The largest single phase of the compiler is the type inference phase. As implemented in the compiler, type inference determines a unique interpretation for every identifier in the source text, and assigns a unique type for every node in the abstract syntax tree. If this is not possible, the reasons are reported.

The algorithm which is responsible for making these assignments is complicated by the following features of the type system:

functions, domains, and categories as first-class values;
parameterized domain and category constructors, param- eterization of functions over domains, categories, other functions, and data values;
dependent function types;
categorical inheritance;
categorical default implementations;
conditional declarations and definitions;
overloaded identifier names;
nested function definitions;
recursion in the types of domains and categories; and
domain and category extension definitions.

A ``domain'' corresponds closely to an abstract type or class in other languages. For an explanation of these features the reader is referred to the language reference [24].

Briefly, the type inference algorithm proceeds in two major passes. When presented with an abstract syntax tree, the first pass collects the possible interpretations for each leaf node in the tree, and propagates a set of possible types, from the bottom up, for each node in the tree. The second pass traverses the tree to propagate type constraints which are inherent in the structure of the program, and uses these constraint types to select a unique interpretation for each leaf node. As the traversal unwinds back toward the root of the tree, a unique type is then computed for each node in the tree.

Interpretations for identifiers which appear in a program may be found from declarations in the current lexical scope in which the identifier appears, or may be found by importing definitions from (possibly parameterized) domains which are defined elsewhere. Definitions available via an import statement are visible throughout the lexical scope in which the import statement appears. Therefore the interpretation of each type which is imported in the current scope must be known before the set of possible meanings for an identifier can be collected. As a result, the bottom-up pass of the type inference phase, upon entry into a given lexical level, recursively invokes the type inference algorithm on the set of types which are used in that level.

The definitions of domains and categories in may involve the inheritance of definitions from other domains and/or categories. As a result it is often the case that the interpretation of one domain D must be completed before the interpretation of another can be determined. These dependencies may be recursive, as in the following declaration:

Ladder(D: with (f: E->E), E: with (g: D->D)): with
        Ladderf: E -> E
        Ladderg: D -> D
== add
        import from D;
        import from E;
        ...

The parameters D and E of Ladder are domains, each of which exports an operation, whose type involves the other parameter. Within the definition of Ladder the definitions of f and g are available, and can be used to define Ladderf and Ladderg. The types of D and E are mutually-recursive. These must be determined by a process of simultaneous type inference which interleaves the usual bottom-up and top-down phases on each set of mutually-recursive types to determine unique interpretations.

The inference algorithm will be described in more detail in another paper.

Stephen Watt
Wed Sep 18 19:39:57 MET DST 1996