%include intro.fmt \chapter{Introduction} \label{chapt.introduction} This thesis investigates the application of attribute grammars to the description of inference algorithms (as implemented in a compiler) that are specified by a collection of inference rules. Such collections are a means to specify the semantics of programming languages. Since there exists no universal inference algorithm for inference rules, it makes the description of inference algorithms a nontrivial exercise. Our goal is to use attribute grammars to make it easier to write such descriptions. \section{Overview} \label{intro.sect.motivation} %% Introduce source code, compiler, and more stuff. \paragraph{Language specifications.} A formal programming language specification describes the language's notation and the notation's meaning, and forms a set of requirements for tools that process programs written in this language. A static semantics of a programming language is often expressed as a relation between properties of programs written in this language, where the relation is described by a set of inference rules. The structure of such a program is the Abstract Syntax Tree (AST) obtained by parsing the source code of the program, and plays usually an important role in the rules. For example, in case of code generation, a static semantics expresses a relation between the AST and machine code, whereas in case of type checking, the semantics typically relates the AST to a type for each program fragment and an environment with a type for each identifier. %% Introduce compilers (implementations) \paragraph{Implementations.} A compiler analyzes a program and computes properties of that program. %% Given a program and a denotational semantics, the question arises if we can %% implement a compiler. This analysis is specified by a \emph{static semantics}, which describes a relation between programs and properties. For example, the specification of a compiler that translates source code to machine code describes a relation between a program and machine instructions. The properties computed by a compiler can be seen as the evidence that there exists a proof that the properties are related to the program. Assuming that the relations are specified with inference rules, such a proof has the form of an attributed tree. Each node represents an application of an inference rule of the semantics, and attributes describe how the rules are instantiated. A semantics is \emph{syntax directed} if the shapes of the proofs are isomorphic to the AST. A semantics is \emph{algorithmic} if additionally the constraints between attributes in the tree are expressible as computable functions. As a compiler's implementation is usually based on traversals of the AST, an algorithmic semantics is a convenient specification of a compiler. Since the implementation boils down to traversing the tree and applying the computable functions as mentioned in the specification in the right order, there exists a clear correspondence between the specification and the implementation. An example is a semantics for an assembly language where the relation between the AST and machine code is actually a bijection. However, most language's semantics are not entirely algorithmic. We come back to this point further below. %% Introduce the attribute grammars Attribute Grammars (AGs)~\citep{DBLP:journals/mst/Knuth68} are an attractive domain specific programming language for the implementation of an algorithmic semantics. Attributes represent inductive properties of the AST. A context-free grammar describes when an AST is correctly structured by relating productions to nodes in the AST. An AG describes when an AST is correctly decorated with attributes by imposing constraints on the attributes per node of the AST, formalized by rules per production. These rules are computable functions between attributes denoted in some host language. AGs abstract from the traversal over the AST and from the order of evaluation of the rules. Therefore, AGs are composable, which makes it possible to describe the implementation of a large language as various separate aspects. Moreover, an AG is compilable to an efficient algorithm that computes the attributes. Thus, an AG serves both as a specification and an implementation of an algorithmic semantics. %% Now come back to abstraction in the specification %% Give a reason why semantics are declarative \paragraph{Declarative specifications.} As argued above, to implement a static semantics, it is preferable that the semantics is algorithmic because of the close correspondence with an implementation. A declarative semantics is a semantics suitable for formal reasoning and documentation purposes, and is usually more abstract and concise than an algorithmic semantics. For example, consider a static semantics that describes a relation between the AST and a sequence of machine instructions. When the relation is not functional, there may be many related sequences of machine instructions for a given AST. In particular, there may be a difference between a shortest sequence of instructions, or a sequence of instructions with the lowest expected execution time. The choice of which sequence is computed is left up to the implementation or specified separately. Moreover, declarative features of a programming language can usually be more concisely described relationally instead of being based on a computable function. An example is operator overloading, where the choice of the implementation of an operator depends on the types of the operands. Effectively, the compiler takes care of some of the work of the programmer, and models the inference of the proof in some way. %% LEFT OUT: DECLARATIVE FEATURES OF PROGRAMMING LANGUAGES %% In general, when the relation is not functional, it describes a declarative feature of the %% programming language. Declarative features allow abstraction from trivial details of the %% program. As a consequence, the program description is more concise, and allows the %% compiler to augment or optimize the program. %% LEFT OUT: AN ALGORITHMIC SEMANTICS MAY NOT BE COMPATIBLE WITH PARTICULAR COMPILER %% FEATURES %% ESPECIALLY IN TYPE INFERENCE WE SEE SOME OF THIS BEHAVIOR %if False We can see this when we look at type inference. A typical property of a program is a type. A \emph{type} expresses invariants on the structure of values. A \emph{type system} defines a relation between program and types, and thus specifies when a program preserves the invariants. A \emph{type inference algorithm} describes how to compute a type that can be proven to be related to the program. Preferably, the algorithm is sound and complete with respect to the type system, and computes also the most general related type. %endif \paragraph{From ASTs to proofs.} A transformation from a non-algorithmic semantics to an algorithmic semantics is non-trivial. An implementation using traversals of the AST is based on an algorithmic semantics. Consequently, it is difficult to keep the implementation consistent with the declarative language specification. In this thesis, we approach the implementation of a static semantics from another direction. We consider AGs based on the grammar that underlies the inference rules of the relation instead of the language's grammar. As a particular advantage, the AG is closely related to the declarative specification. However, the AG specifies when an attributed proof is valid, but does not specify how to obtain the proof. Since there exists no general procedure that maps such an AG to an algorithm that infers the proof, we need to augment the AG with additional information to obtain such an algorithm. %% With higher-order AGs we take one step in the right direction, but are still not there yet. %% Explain HOaGs. %if False For complex type systems, the shape of the proof and the types are mutually dependent. For example, in the presence of impredicative polymorphism~\citep{DBLP:conf/icfp/VytiniotisWJ08}, whether or not it is necessary to instantiate a polymorphic type may depend on the type, and affects the shape of the proof. Moreover, the type depends on the shape of the proof. Inference algorithms therefore make explicit assumptions about the intermediate states of the proof during the proof's construction. Since an AG describes only what a valid proof is, it imposes constraints on the final state of the proof. It is not always clear how such constraints propagates to decisions to be taken given an intermediate state of the proof\footnote{ The type of inference algorithms that we wish to describe can be implemented with turing machines. With AGs we can indirectly express Turing machines (Section~\ref{intro.sect.correspondences}). Such algorithms are thus theoretically expressible. Instead, this thesis addresses language design: how to describe such algorithms with attribute grammars \emph{concisely}. }. To circumvent this issue when using AGs, the non-functional subparts of the proof are typically represented separately as first-class values and contained in attributes. For example, a substitution represents the intermediate states of many straightforward equality proofs. However, since such values are atomic with respect to AGs, it is hard to combine properties of the partial proofs with attributes of the AG. For complex type systems, these partial proofs play a more prominent role, and thus are AGs in their current form not well suited to describe such type systems. This is unfortunate, given the advantages that AGs offer. %endif \paragraph{Thesis.} In this thesis, we focus on the description of type inference algorithms with attribute grammars. Most type inference algorithms are a complex combination of a small set of inference techniques, such as type variables and unification to calculate with values that are not fully known yet, fixpoint iteration to approximate solutions, constraints to defer the inference of a subtree, and search tree construction to encode alternative solutions. Such techniques are based on gradually building a proof and exploring intermediate candidate solutions. In our approach, we conservatively extend AGs to support these techniques and balance between assumptions about the evaluation algorithm, and the preservation of the declarative nature of the description. The central concept that underlies our extensions are higher-order, ordered attribute grammars. In such higher-order AGs, the domain of an attribute can be an attributed tree, which allows us to dynamically grow the tree. The state of a tree is described by a configuration, which specifies the decorations that have been computed. In an ordered AG, the configurations are linearly ordered. A visit, a unit of evaluation for a node, transitions the state of the node to a state described by the next configuration. This concept offers control over the simultaneous evaluation of attributes and exploration of the tree. \paragraph{Chapter organization.} This chapter presents background information and a short outline. We assume that the reader has a strong background in the programming language Haskell~\citep{O'Sullivan:2008:RWH:1523280}, is familiar with type systems~\citep{509043}, and knows the basics of attribute grammars~\citep{DBLP:journals/mst/Knuth68}. This chapter gives (rather) informal definitions of relevant concepts and provides pointers to literature. The actual contents of the thesis start with the next chapter, Chapter~\rref{chapt.rulercore}. Chapter~\rref{chapt.rulercore} gives a detailed summary of our extensions and shows how these fit together. Each following chapter covers an extension in detail. We address concepts of type systems and attribute grammars in combination with notation in Section~\ref{intro.sect.typesystems} and Section~\ref{intro.sect.ags}. In particular, we recast the notion of ordered attribute grammars. Section~\ref{intro.sect.ruler} addresses previous work, Section~\ref{intro.sect.overview} gives an overview of each of the extensions, and Section~\ref{intro.sect.globalpicture} sketches the overall goal. Finally, Section~\ref{sect.intro.relatedwork} addresses related work. %if False \section{Complexity of Software, Programming Languages and Compilers} \label{intro.sect.othermotivation} %% Computer programs manipulate structured values that are encoded as sequences of bits in the memory of %% a computer during their execution. A program's code thus makes assumptions about the %% interpretation of these sequences. %% Such programming errors give rise to the question how these assumptions can be %% made explicit and verified, preferably before even executing the program. The motivation for our research has its roots in the vicious circle of the increasing complexity of software and society's increasing dependence on software. As software becomes more complex, the development of computer programs becomes more tedious. There is an interplay between the complexity of software, its quality, and its development and maintenance time. The purpose of software engineering is to improve on these aspects in order to develop software with more functionality, in less time, and containing fewer errors. A lot of effort is invested to define more efficient development processes, to increase the understanding of programmers in concepts such as parallel programming and security, to improve testing procedures, and last but not least: to equip the programmer with better tools. Of these tools, programming languages play a major role in coping with the complexity of software systems. Advances in expressiveness of programming languages lead to more concise, more understandable, more abstract, and more reusable source code. Using mainstream programming languages, programmers write the equivalent of thousands of machine instructions with a couple of lines code, and even more so with domain-specific programming languages. In the design of these languages, the concept of types plays a major role. The purpose of types in programming languages has grown beyond specifying how data is stored in memory. Types are a means to specify verifiable properties about programs. For example, a desirable property of a program is that the program cannot misbehave at runtime. Furthermore, types facilitate type-driven program optimizations and meta programming, and also auto completion and refactoring in (graphical) editors. Types lead to better programming languages, which in turn allows us to cope with more complex software, if we provide sufficient tooling to support programmers in the use of these typed languages. A compiler is a tool that points out mistakes in the programmer's code, and takes the burden to translate the concise human-readable code into optimized machine code. Its implementation is specified by the language's semantics. A static semantics defines the correctness criteria, and a static semantics specifies the translation to machine code. A language thus requires a sensible semantics to be usable in practice. A type system associates types with the values computed by the program, and is thus a static semantics in statically-typed languages. The purpose of a programming language is to facilitate the description of algorithms in its intended domain. If a language is too restrictive, such algorithms may not be expressible (concisely). On the other hand, restrictions allow for static checks and optimizations. For example, some algorithms can be expressed as primitive recursive functions, which are guaranteed to terminate on finite inputs. However, not all restrictions may be enforceable by a compiler implementation. There are hard theoretical boundaries (e.g. as demonstrated by the famous halting problem) that a compiler cannot cross. For example, typed programming languages often require programmers to annotate their programs explicitly with types. These types are then checked by the compiler. More recent languages allow (under certain conditions) type annotations to be omitted. The compiler then attempts to infer a suitable type. However, for many type systems, full type inference is undecidable. The challenge is to delicately restrict type systems, so that inference is possible for type correct programs, while at the same time not making the set of type correct programs too small. This requires boundaries to be stretched further. Consequently, both type systems, and the implementation of a compiler becomes more complex. A type system typically describes an inductively defined \emph{relation} (typing relation) between programs and types. For a given program, a type inference algorithm proves the existence or absence of an \emph{admissible type}, which is a type when paired with the program is a member of the typing relation. There is thus a relation between the type of the program, and the abstract syntax tree of the program. This relation is not necessarily a (computable) \emph{function}. When the relation is not a function, it allows the type system to abstract from the inference algorithm, and offers the algorithm the freedom to the search for a proof. Since type directed optimizations and transformations are defined in terms of the structure of such a proof, the (type-directed) automation provided by a compiler depends on the difference between the proof and the abstract syntax tree of the program. For complex type systems, the typing relation is not a function, which raises the question how to define a sound and complete inference algorithm. Moreover, it is desirable that an inference algorithm computes a proof that meets certain quality criteria. An example of such a criterion is that a proof is computed for the most general admissible type to obtain better modularity and predictable inference. A language designer's challenge is to design a language such that an inference algorithm exists. Type systems and accompanying inference algorithms are typically defined for individual, and orthogonal language features on small languages (typically lambda calculi), but it is their combination which makes a type systems implementation difficult. The incorporation of inference algorithms is a non-trivial task, as the algorithms for the individual language features are cross-cutting. In general, such algorithms exhibit common techniques, such as breadth-first search, fixpoint computations, and unification with deference of judgements via constraints. These techniques incrementally construct and filter candidate proofs. Different interleavings of such techniques may have different effects on the outcome, which makes it difficult to combine such techniques. Additionally, compiler implementations face the requirements that a compiler needs to behave predictably, be reasonably efficient, and give proper error messages. All these implementation details together require many design decisions to be made. It is ridiculed that the implementation of a type inference algorithm can only be maintained by the programmer that wrote it. This may not be far from the truth, as there is typically a big gap between the specification and the actual implementation, which makes the implementation hard to maintain and extend. Compiler implementations evolve continuously due to language and algorithm changes, which makes it even harder to keep the specifications and implementations mutually consistent. Moreover, inconsistencies between implementation and specification arise because some type systems are first implemented as some language extension, and only later formalized. This thesis positions itself in a project with the ultimate goal to generate a type inference algorithm from a type system specification that is annotated with algorithmic strategies. Effectively, we wish to transform typing relations into computable functions, while preserving the original structure of the typing relation. This approach ensures that both specification and implementation are kept consistent, provides abstractions for common idioms, facilities rapid prototyping, and aids reasoning. This approach thus serves as a means to cope with the complexity of the implementation of compilers. To reach this goal, we develop a domain-specific programming language \emph{Ruler}~\citep{DBLP:conf/flops/DijkstraS06} for the description of such annotated type systems. Prior work shows how to describe type systems with Ruler, and generate an inference algorithm when the typing relation is a function of the structure of a program. In this thesis, we develop a core language for Ruler, \emph{RulerCore}, which unifies the common techniques that are employed by inference algorithms. In our approach, (higher-order) attribute grammars (AGs) form the basis of this language. AGs are an attractive domain-specific programming language to express syntax-directed translation. A particular challenge in a compiler's implementation that is addressed by AGs is the coordination of the flow of information (environments, types, code) over the abstract syntax tree (AST) representation of a program. An AG associates attributes with each production in the grammar of a language, and rules that defines attributes functionally in terms of other attributes. In an AG, we capture the information as attributes, and the flow as rules. Such descriptions are composable. We can easily add more attributes and rules, and refer to existing attributes. Composable descriptions, as offered by AGs, are important because the implementation of a type system in a compiler involves more than the type system. A type system specifies only the constraints imposed on types, whereas a compiler uses the types to produce error messages and optimized code. Attributes thus allow an exchange of information between type inference and code generation. Such descriptions are also concise. We mechanically obtain an algorithm for the evaluation of rules at each node of the AST, so that we do not have to concern ourselves with programming the actual traversals over the AST. AGs in the traditional sense assume the AST to be present a priori and describe the decoration of the AST with attributes. A type inference algorithm, however, describes the simultaneous construction and filtering of one or more candidate proofs. A direct and 1-1 correspondence between an AG and an inference algorithm can thus be made when the structure of the proof coincides with the abstract syntax tree, and the object terms in the proof coincide with attributes. Since this is in general not the case, RulerCore extends the AG formalism to allow such algorithms to be described. %endif \section{Background on Type Systems} \label{intro.sect.typesystems} %% A potential difficulty is here that these subjects are covering multiple disciplines In later chapters, we use type systems based on variants of the lambda calculus as example. In this section, we give a short summary on the lambda calculus, show the evaluation of lambda terms, and give a type system. Furthermore, the discussion of variants of the lambda calculus and their type systems serves as a vehicle to discuss design and implementation challenges of type systems, which we use in the motivation of this thesis in Section~\ref{intro.sect.overview}. Another purpose of this section is to introduce vocabulary and notation for subsequent chapters. We assume that the reader is already familiar with the lambda calculus and type systems. Introductory books on type systems~\citep{509043,harper} provide a more extensive and formal explanation. \subsection{Specification of Programming Languages} Programming languages are described by a grammar since a grammar specifies the set of programs that belong to the language. A semantics relates a program to some properties in a given domain. When these properties represent the correctness of the program, or represent a program in (another) programming language, we talk about a static or denotational semantics. When the properties describe the runtime behavior of the program (typically in the form of state transitions), we talk about a dynamic or operational semantics. We assume that the specification of a programming language consists of a context-free grammar (Section~\ref{intro.ag.syntax}) and static semantics. The specified programming language is called the \emph{object language}, and a program an \emph{object term}. In case of compilation and transformation, the object language is often referred to as the \emph{source language} and object terms as \emph{source terms}. The language that describes the specification and the language in which the compiler is implemented are called \emph{meta languages}. Finally, in case of a translation to a different language, the language where we translate to is called the \emph{target language} and the translated program a \emph{target term}. \subsection{The Lambda Calculus} \label{intro.sect.lambdacalculus} The lambda calculus is a language that is often used in programming language research, and in research on type systems in particular. Many concepts of programming languages have their roots in variants of the lambda calculus, or have been well-studied in such a context. We discuss the lambda calculus, because we use its concepts in object languages, source languages and target languages in the following chapters. Figure~\ref{fig.intro.lambda} shows the abstract syntax |e| of expressions in an explicitly-typed variant of the simply-typed lambda calculus, which may contain types |tau|. In passing, we also give syntax for environments and evaluation contexts. The structure of |e| is called the Abstract Syntax Tree when represented as a tree (Section~\ref{intro.ag.syntax}). \begin{figure}[tb] %format ty = "\tau{}" %format gam = "\Gamma{}" %format . = "." %format emptyset = "\emptyset{}" %format ty1 %format ty2 %format ty3 %format Ctx = "\mathcal{E}" %format (InCtx (x)) = Ctx "\left\{" x "\right\}" %format (InHole (x)) = "\left\{" x "\right\}" \begin{code} e ::= e1 e2 -- application, with expression |e1| and |e2| | x -- variables, e.g. |x,y,z| | v -- values (head-normal form) v ::= i -- integer constant, e.g. |3| and |42| | \ x : ty . e -- abstraction, with param |x| of type |ty| and body |e| ty ::= I -- integer type | ty1 -> ty2 -- function type gam ::= emptyset -- empty environment | gam, x : ty -- environment |gam|, with on top a mapping of |x| to type |ty| x, y, z -- variables f, g, h, a, e -- expressions \end{code} \caption{Syntax of the explicitly-typed lambda calculus variant.} \label{fig.intro.lambda} \end{figure} In the lambda calculus, a function may be passed as a value |v| in the form of a lambda expression, and thus is \emph{first class}. Such a function is anonymous, although the function can be given an explicit name by binding the function to an identifier. For example, the following expression denotes the application of a function that applies the identity function that it receives as an argument to the value |3|: \begin{code} (\ y : I -> I ^^ . ^^ y 3) (\ x : I ^^ . ^^ x) -- evaluates to |3| of type |I| \end{code} %format ~> = "\leadsto{}" To specify to what value an expression evaluates, we provide below an operational semantics. In this semantics, the simultaneous substitution of all free |x| in |e1| by |e2| is denoted by |[x := e2] e1|. The semantics\footnote{ Such a semantics is called a small-step operational semantics because the inference rules describe a transformation step from an intermediate term to another intermediate term, and the actual transformation is the exhaustive application of these transformation steps. } consists of the reduction relation |e1 ~> e2|, which is the smallest relation that satisfies the following inference rules\footnote{ In Section~\ref{intro.sect.typerules} we will actually consider a notation for inference rules and their interpretation. } of Figure~\ref{fig.intro.reduction}, which we explain below. \begin{figure}[tb] \begin{mathpar} \fbox{|e1 ~> e2|}\\ \inferrule*[right=beta] {} {|(\ x : ty ^^ . ^^ e1) e2 ~> [x := e2] e1|} \inferrule*[right=left] {|e1 ~> e2|} {|e1 e ~> e2 e|} \end{mathpar} \caption{Operational semantics as a reduction relation on expressions |e|.} \label{fig.intro.reduction} \end{figure} A \emph{beta-redex} is an expression of the form |(\x . e) a| which can be reduced. The above rules describe normal-order reduction. Through the rule \TirName{left} the beta-redex in the head position is identified. Indeed, if we consider the reduction of the above example, we end up first substituting |f| with the identity function, such that we obtain |(\ x:I ^^ . ^^ x) 3|, and then substitute |3| for |x|, such that we end up with the value |3|. When no reductions are possible anymore, the expression is in \emph{head normal form}. \subsection{Type Rules} \label{intro.sect.typerules} The purpose of a static semantics is to exclude programs that incorrectly use their data. %% When a program is well-formed, then the result of a terminating evaluation is in normal form. %% When a program is well-typed, then the result of a terminating evaluation has the same type A type system classifies expressions as well-typed if it can associate a type with it. A type system is sound if it has the subject reduction property, which means that after each reduction step, the resulting term has the same type as the original expression. Type systems typically ensure the absence of certain programming errors, such as passing an integer where a function is expected. %format Gamma = "\Gamma{}" As an example, we give a type system specification for the lambda calculus as defined above in Figure~\ref{fig.intro.expltyperules} and explain it below. We allow liberal syntactic sugar in our specifications. Depending on the context, the notation |many x| represents a sequence or set |x1, ..., (sub x n)|. \begin{figure}[tb] \begin{code} r ::= forall (many x) ^^ . ^^ d -- quantified rule over meta variables |many x| d ::= j1 ^^ ... ^^ (sub j n) +:+ j -- rule with premisses |(many j)|, conclusion |j|, |n >= 0| j ::= (sup R n) (many m) -- judgment with |abs (many m) = n| (sup R n) -- |n|-place relation |R| (|n| often omitted) m -- argument (meta expression) x, y -- meta variables for values in the object language \end{code} The notation |(inferrule (j1 ^^ ... ^^ (sub j n)) j)| (with optional rule label) is sugar for |j1 ^^ ... ^^ (sub j n) +:+ j|. The explicit equality |x1 ~~ x2| denotes a judgment |subsup Eq tau 2 x1 x2|, with relation |sub Eq tau| representing structural equality of object terms of meta-type |tau|. The quantification of meta variables is usually left implicit. \caption{Notation for inference rules.} \label{fig.intro.rules} \end{figure} We define a typing relation |gam :- e : ty|, which is notation for a three-place relation over |gam|, |e|, and |ty|. A judgment of this relation states that in environment |gam|, the expression |e| has type |ty|. We typically describe such relations with inference rules (type rules). Figure~\ref{fig.intro.rules} gives the syntax of inference rules. A type rule |r| consists of zero or more premisses (judgements |(many j)|) above the line, and a conclusion |j| below the line. We come back to meta expressions |m| later, but it contains at least the meta variables |x|. \begin{figure}[tb] %format :- = "\vdash{}" \begin{mathpar} \fbox{|gam :- e : ty|}\\ \inferrule*[right=con] {} {|gam :- i : I|} \inferrule*[right=var] {|(x : ty) `elem` gam|} {|gam :- x : ty|} \inferrule*[right=app] {|gam :- f : ty1 -> ty2|\\\\ |gam :- a : ty1|} {|gam :- f a : ty2|} \inferrule*[right=lam] {|gam, x : ty1 :- e : ty2|} {|gam :- \ (x : ty1) . e : ty1 -> ty2|} \end{mathpar} \caption{Type rules of the explicitly typed lambda calculus variant.} \label{fig.intro.expltyperules} \end{figure} Figure~\ref{fig.intro.expltyperules} shows the rules of the lambda calculus as introduced above. The rule \TirName{con} associates the type |I| to any integer constant. The \TirName{var} rule associates the type to the identifier as it is bound to that identifier in the environment. The \TirName{app} rule requires expression |f| to be a function that takes an argument of the same type as its formal parameter. The type associated with the application itself is the result type of the function. In \TirName{lam}, the body of the lambda may assume a type |ty1| associated with identifier |x| in the environment. The typing relation is the smallest relation that satisfies a set of rules |r|, which serve as axioms of the relation. In a judgment, meta expressions |m| are arguments to some relation |R|. The language of meta expressions is a formal language, which is usually left implicit, but facilitates the construction of symbolic and concrete object terms. A Relation |R| can be a built-in relation (an \emph{atomic} relation) or relation described by inference rules. Instead of using expressions |m| to construct symbolic terms, atomic relations for each production in the object language can equivalently be used to constrain symbolic object terms. For example, for the arrow-production in the syntax of types, we assume the existence of an atomic relation |sub(C)(->) tau1 tau2 tau3| that expresses |tau1 == tau2 -> tau3|. Then, instead of a judgement |R (tau2 -> tau3)|, we may write |R tau1| with a fresh |tau1| and |(sub(C)(->)) tau1 tau2 tau3|. The rules are in canonical form when the arguments of relations in the judgments consist only of meta variables, with the single exception of the judgment denoted |m (sup (~>) (*)) x| which represents a reduction relation on object terms where |m| is a meta expression. Additionally, each meta variable occurs at most once, not counting its occurrences in (additional) explicit equalities. We assume in this thesis that meta expressions are written in Haskell. For reasons of simplification we assume below that meta expressions in this chapter are written in the above lambda calculus so that the definitions of |m| and |e| coincide. To rewrite rules into canonical form we use the operational semantics of (meta) expressions. For example, we replace a judgment |R e| to |R x| and introduce the additional premise that |e (sup (~>) (*)) x|, where |x| is fresh and |sup (~>) (*)| is the exhaustive application of the reduction relation on (meta) expressions. To deal with the occurrences of meta variables, we introduce additional explicit equalities and quantified fresh variables. Rules in canonical form are more verbose, but easier to formally reason with. Given arguments |(many a)| for the typing relation |R| described by inference rules, we can prove that these are a member of the typing relation, denoted by |R (many a)| or |(many a) `elem` R|, by constructing a \emph{derivation tree} using the rules of the relation. A derivation tree is a proof for |R (many a)| when there is a rule of |R| with a conclusion that matches against |R (many a)| (with substitution $\theta$), and for each premise of the rule (with $\theta{}$ applied), there is subtree that is a proof for that premise. Proofs of atomic relations are leafs of a derivation. A derivation tree for the earlier example is: %format gam0 %format gam1 %format gam2 \begin{mathpar} \inferrule*[right=app] { \inferrule*[left=lam] { \inferrule*[right=app] { \inferrule*[left=var] {|(y:I->I) `elem` gam1|} {|gam1 :- y : I->I|} \\ \inferrule*[right=con] {} {|gam1 :- 3 : I|} } {|gam1 :- y 3 : I|} } { |gam0 :- \ y:I->I ^^ . ^^ f 3 : (I->I)->I| } \\ \inferrule*[right=lam] { \inferrule*[right=var] { |(x:I) `elem` gam2| } { |gam2 :- x : I| } } { |gam0 :- \ (x:I). x : I->I| } } {|gam0 :- (\ y:I->I ^^ . ^^ f 3) (\ x:I ^^ . x) : I|} \end{mathpar} \begin{code} gam0 = emptyset -- initial environment gam1 = gam0, y:I->I -- extension of |gam0| in the application of \TirName{lam} (left branch) gam2 = gam0, x:I -- extension of |gam0| in the application of \TirName{lam} (right branch) \end{code} Each node |v| in the derivation tree is associated with a judgment |sub j v = (sub R v) (many a)|, and the subtree rooted by |v| is a proof that |many a `elem` (sub R v)|, where |sub R v| is the associated relation. Furthermore, node |v| is associated with some rule |sub r v|, which determines the structure of |v|. The node is furthermore decorated with values for the meta variables that are bound by |sub r v|. Type rules are \emph{syntax-directed} when in derivation trees, for each node |v|, the associated rule |sub r v| is uniquely determined by an argument |a| (which represent the expression for which we try to construct the proof) of the associated judgment |sub j v|. Thus, for syntax directed rules, the choice of what rule to apply in the construction of the proof is determined uniquely by productions of the object language, and there is a one-to-one mapping between nodes in the abstract syntax tree and nodes of the derivation tree. This is a desirable property because it means that the shape of the derivation tree depends only on the structure of the object term, and is thus given at the start of the proof construction: we know what rules to apply where, and are only left with the problem of how to make their instances match. However, when the correspondence between |sub r v| and the a priori known arguments |many i `subset` many a| of |sub j v| is not functional, we call the rules \emph{declarative}\footnote{ Some authors call declarative rules \emph{nondeterministic} because an inference algorithm may need to choose nondeterministically what rule to apply. }. Additionally, when |sub r v| also depends on the inferred type |tau|, with |tau `elem` many a|, the rules are declarative and also \emph{type directed}. A rule |sub r v| imposes two forms of constraints on node |v| in the derivation tree. Constraints on the structure (as discussed above), and constraints on the values for the node's meta variables that decorate |v| (as discussed below). When constructing a proof, we choose rules and values that satisfy these constraints. We call the description of such choices \emph{aspects} of the rules. These aspects are declarative when the decisions are not functionally determined by |many i|. Declarative aspects complicate the construction of a proof, because there may be many choices that seem suitable to complement a partial proof, but turn out later to be inappropriate. In Section~\ref{intro.sect.inference}, we discuss strategies to resolve such declarative aspects. We determine which meta variables are declaratively defined with a transformation of the relations into functions: those meta variables that make the function non-deterministic are declaratively defined. For that, each parameter of a relation must have a specification that declares it (conditionally) as either an input or output. In a judgement |j = R (many x)| of a rule in canonical form, a meta variable |x `elem` (many x)| is at an input position when |j| is a premise and |x| is passed as output argument, or when |j| is a conclusion and |x| is passed as an input argument. Otherwise, a meta variable is at an output position. In case of explicit equalities |x1 ~~ x2|, |x1| and |x2| are both at an input position. In case of the reduction relation |m ~> x|, the occurrences of meta variables of |m| are at an input position and |x| is at an output position. In case of the construction/destruction relation |sub C P| related to the production |P| of the object language, in the judgment |sub C P x0 x1 ... (sub x n)|, |x1, ..., (sub x n)| are at an input position if and only if |x0| is at an input position. A variable is \emph{declarative} if none of its occurrences are on an output position and the variable does not occur in an explicit equality with a non-declarative variable. As an example, we add a type rule for an implicitly typed lambda abstraction, and discuss which meta variables are declarative. As preparation we show the canonical forms \TirName{lam-impl'} and \TirName{app'} in Figure~\ref{fig.intro.canonical}. When we assume that types are an output of the relation, the meta variable |ty1| of \TirName{lam-impl} is declarative. When we assume that types are an input of the relation, the meta variable |ty1| of \TirName{app} is declarative. \begin{figure}[tb] \begin{mathpar} \inferrule*[right=lam-impl] {|gam, x : ty1 :- e : ty2|} {|gam :- \x . e : ty1 -> ty2|} \inferrule*[right=app] {|gam :- f : ty1 -> ty2|\\ |gam :- a : ty1|} {|gam :- f a : ty2|} \inferrule*[right=lam-impl'] {|gam, x : ty1 :- e : ty2| \\\\ |sub C (->) m2 ty1 ty2| \\ |sub C App m1 x e| \\ } {|gam :- m1 : m2|} \inferrule*[right=app'] {|gam :- f : m2|\\ |gam :- a : ty1| \\\\ |sub C (->) m2 ty1 ty2|\\ |sub C App m1 f a| \\ } {|gam :- m1 : ty2|} \end{mathpar} \caption{Rules \TirName{lam-impl} and \TirName{app}, and their respective canonical form.} \label{fig.intro.canonical} \end{figure} The above sketch of an analysis assumes that the atomic relations are computable functions. We call type rules declarative if they exhibit declarative aspects, and \emph{algorithmic} otherwise. In the latter case, when the rules are described in Ruler (Section~\ref{intro.sect.ruler}), a type inference algorithm can be generated. \subsection{Type Inference} \label{intro.sect.inference} A type checking or type inferencing algorithm concerns itself with constructing derivation trees when given the main judgment. In case of type checking the type is part of these a priori known arguments. In case of type inference the type is not, and actually inferred. This distinction between type checking and type inference is rather vague, because a type system may exhibit other declarative aspects than the definition of types, and thus implies some form of inference. Moreover, for a compiler typically the inferred arguments matter but the derivation tree is not of direct interest. \paragraph{Inference as a forest of derivation trees.} Some authors consider type inference as a two-step process: a traversal of the AST to generate a set of constraints, and solving this set of constraints. We shall, however, consider type inference as the incremental construction of a forest of derivation trees, which has a closer connection to declarative specifications. Its intermediate state is a forest of partial derivation trees. It starts with a singleton forest containing an empty derivation tree with the main judgment as pending aspect. A derivation tree in the forest has an associated status, which is that it is partial, complete, or unsatisfiable. A reduction step in this forest consists of cloning an existing partial derivation and adding the cloned derivation to the forest after resolving one pending aspect of the clone such that the number of unique derivation trees in the forest increases. An aspect is either a meta variable or a judgment. To resolve the former, the meta variable is bound to a value in its domain. To resolve the latter, it depends on whether the relation of the judgment is described by rules or an atomic relation. If the relation is described by rules, either a rule can be applied which leads to a larger or a complete derivation tree, or the derivation tree is unsatisfiable. If the relation is atomic, then the aspect is solved by running an algorithm that is associated to the relation which either succeeds or fails. The existence of a complete derivation means that the source program is type correct. When all reductions have been applied exhaustively and there exists no complete derivation the program is type incorrect. However, there may be infinitely many partial derivations. Also, there may be infinitely many partial derivations of a certain height, for example, when the domain of a meta variable is infinite. We come back to this issue later: in practice, an inference algorithm uses a less general approach. \begin{figure}[tb] %format subst = "\theta{}" %format subst0 %format subst1 %format subst2 %format subst3 %format subst4 %format :~: = "\diamond{}" %format :> = "\triangleright{}" \begin{code} prove :: j -> Inf Tree -- |Inf| offers unification and backtracking prove (x ~~ y) = do -- case for explicit equality unify x y -- equality mapped to unification return (x ~ y) -- builds equality node prove (R (many x)) -- case for judgments | atomic R = extern R (many x) -- proven externally | otherwise = do -- relation specified by rules r <- rules R -- picks a rule (backtrack point) (ps +:+ c) <- instantiate r -- instantiate rule zipWithM unify (args c) (many x) -- bind args ts <- mapM prove ps -- recursion on premises return (ts (sub(:>)(name r)) c) -- build derivation node \end{code} \caption{Sketch of a general inference algorithm.} \label{fig.intro.prove} \end{figure} \paragraph{Naive algorithm.} The algorithm in Figure~\ref{fig.intro.prove} represents a typical Prolog-style strategy for the construction of derivation trees. We explain some aspects of the example below. In this example, we refer to an inference monad |Inf| which takes care of backtracking and unification. Rules are tried in a predefined order. If a rule cannot be applied, \emph{backtracking} occurs to the next rule. Meta variables are initially represented as symbolic values, and unified with other meta variables or object terms during inference. Unification is derived from an equivalence relation, which can in turn be derived generically from algebraic data type declarations. The operation |rules| introduces a backtrack-point. It tries the rules in a given order by feeding the rules one by one to the continuation. A later unification may fail and cause a backtracking to that point. The operation |instantiate| substitutes the quantified meta variables with fresh meta variables. The instruction |extern| proves a judgment externally and returns evidence for it if it succeeds. With combinators |:>| (internal nodes) and |~| (equality leafs) we construct evidence in the form of a derivation tree. \paragraph{Undesirable properties of the algorithm.} The above algorithm is incomplete and does not terminate for all but the simplest type systems. This is necessarily the case because there exist type systems for which inference is undecidable. Via backtracking, only finite and inductively defined object terms can be inferred, and unifications only produce compositions of object terms. The algorithm is also inefficient. The order in which rules are tried may cause poor performance or nontermination. Moreover, the order in which rules are tried is not specified, which gives unpredictable results. Although the results are sound, these may not be optimal. In practice, typical algorithms refrain from backtracking or constructing many candidate derivations, because the search space is too large when dealing with ASTs with thousands of nodes. Therefore, actual inference algorithms resemble this overall approach, but select rules and instantiations of meta variables in a more sophisticated way. \paragraph{Actual inference algorithms.} Actual inference algorithms apply various strategies (depending on the form of the declaratie aspect) to get around the above undesirable properties. Unification can be used to deal with equivalence judgments in type rules. Alternatively, an algorithm may collect several candidate constraints on a meta variable, then pick an object term that is the least solution to all these constraints. When constraints are monotonic, a value can gradually be approximated via fixpoint iteration with an initial bottom value. Type and effect systems often require such algorithms. When rule selection is declarative, we typically want the choice to be a function of the syntax of the language (\emph{syntax directed} rules) or other object terms (e.g. \emph{type directed} rules). Alternatively, some rules can be restricted to only be applied in a proof after other rules have been applied so that the choices between the remaining rules becomes functional in the above sense. This is not always possible: the applicability of a rule may depend on unresolved meta variables and may require the rule selection to be deferred which is called residuation~\cite{DBLP:journals/jlp/Hanus94}. \paragraph{Challenge: orchestration of strategies.} The orchestration of such strategies is a complex undertaking. The order in which strategies are applied may influence the result, and it may not always be clear when to start or stop applying strategies. These are all challenges an implementation must deal with. \paragraph{Challenge: annotations.} The holy grail of type system research is to define expressive type systems (for a class of programs) that have sound and complete (decidable) inference algorithms. Given two type systems (that satisfy type-soundness with respect to the operational semantics of the language), one type system is more expressive than another type system if accepts a superset of the programs that the other accepts. To bypass the strict undecidability boundaries, many languages allow programmers to assist the inference progress by providing additional information (e.g. type annotations) in the object program that translates to concrete bindings for otherwise declarative meta variables. There are delicate balances between expressiveness of the type system, the amount of annotation to be provided by the programmer, and the predictability of inference. For example, the Damas-Hindley-Milner type system and accompanying inference algorithm (Section~\ref{sect.intro.dhm}) does not require any annotations to help the inference process, but disallows functions with parameters of a polymorphic type. System F, the polymorphic lambda calculus, in comparison allows funtions with a polymorphic type, but requires an abundance of type annotations. As a middle way, HML~\citep{DBLP:conf/popl/Leijen09} expresses all of System F, but requires only type annotations for polymorphic lambda parameters. FPH~\citep{DBLP:conf/icfp/VytiniotisWJ08} positions itself in between DHM and HML. It requires a type annotation when applying a function to an argument with a polymorhic type. \citet{UUCS2006051} proposed a global flow analysis that propagates type annotations to locations where (a part of) the annotation is also applicable. \paragraph{Challenge: type errors.} As stated before, type inference concerns itself with the construction of derivation trees. If no such derivation exists, it is considered a type error. Type rules only specify under what condition a type is acceptable. Thus, an implementation may require additional information to produce understandable error messages~\citep{DBLP:conf/icfp/HeerenHS03}. \paragraph{Challenge: multiple derivation trees.} A partial order on types specifies when one type is more general than another type. A type is \emph{principal} when it is the most general type of an expression. For reasons of predictability and modularity it is desirable that a principal type exists for an expression and that an algorithm infers a derivation for this type. This notion of principality can be extended to derivations~\citep{Jim:1996:PTT:237721.237728}. For the above type rules, there are infinitely many derivation trees for (|\x . x|), but none of the accompanying types (e.g. |tau -> tau| for any type |tau|) are comparable, thus this expression has no principal type (thus also not a principal derivation) for the given type system. There are several remedies. The type system can be changed such that more type annotations need to be given, or that choices can be deferred by encoding pending choices in the type language as constraints. Polymorphism and qualified types are an example of the latter: type schemes are introduced to represent a delayed choice of types, as we show in the next section. \subsection{Parametric Polymorphism} \label{sect.intro.polymorphism} For the identity-function |\x . x|, there are no constraints on the type of |x|. During inference, no binding arises for the meta variable associated to |x|. We can thus bind a fresh type constant |alpha| to the meta variable, which leads to the type |alpha -> alpha| for |\x . x|. To specify that |alpha| can be any type, we universally quantify over |alpha|, and obtain the type |forall alpha . alpha -> alpha|. This process is called abstraction or \emph{generalization}. A polymorphic type represents many types which can be obtained by \emph{instantiating} the quantified type constants (\emph{type variables}). Moreover, we obtain a proof for the instantiated type by instantiating the proof for the generalized type in an analogous way. Polymorphic types enable \emph{parametric polymorphism}, which allows functions to be called with parameters of different but acceptable types. Parametric polymorphism is important for the use of Haskell's many convenient higher-order functions, such as |id|, |flip| and |$$|. \paragraph{Quantified types.} The \emph{Damas-Hindley-Milner} (DHM) type system serves as the classical example of a type system with polymorphic types. It forms the basis of the type systems of ML and Haskell, and underlies the type systems of many other languages. DHM's type language is an extension to that of the simply typed language calculus. It additionally contains type variables |alpha| and poly types (\emph{type schemes}) |sigma|: \begin{code} tau ::= alpha -- type variable | tau1 -> tau2 -- function type sigma ::= forall alpha . sigma -- poly type: may have a qualifier | tau -- mono type: has no quantifiers alpha, beta -- type variables \end{code} A mono type |tau| does not contain universal quantifiers. A poly type can be interpreted as the infinite set of mono types, where each quantified type is substituted by some mono type. The identity function |\x . x| has the type |ty -> ty| for any type |ty|. It can thus be given the poly type |forall alpha . alpha -> alpha|. The type variable |alpha| in such a type is an object term that is not to be confused with a meta variable. The distinction between mono and poly types is important in the declarative type rules of DHM. Lambda abstractions take mono types as parameter\footnote{ It has been shown that the inference of a polymorphic type for a parameter of a recursive function is undecidable in general~\citep{Wells1999111}. Many type systems thus impose restrictions on the types of function parameters to make inference feasible. }, and have a mono type as result. Polymorphic types are introduced in the environment by (non-recursive) let-expressions: \begin{code} e ::= ... | let x = e1 in e2 -- generalized type of |e1| is visible as |x| in |e2| \end{code} Although a let-expression can be interpreted as |(\x . e2) e1|, the typing derivation may differ depending on the presence and application of a generalization rule. \begin{figure}[tb] \begin{mathpar} \fbox{|gam :- e : sigma|}\\ \inferrule*[right=var] {|(x : sigma) `elem` gam|} {|gam :- x : sigma|} \inferrule*[right=app] {|gam :- f : tau1 -> tau2|\\\\ |gam :- a : tau1|} {|gam :- f a : tau2|} \inferrule*[right=lam] {|gam, x : tau1 :- e : tau2|} {|gam :- \x . e : tau1 -> tau2|} \inferrule*[right=let] {|gam :- e : sigma| \\\\ |gam, x : sigma :- b : ty | } {|gam :- let x = e in b : ty|} \inferrule*[right=gen] {|gam :- e : sigma| \\\\ |alpha `notelem` ftv gam| } {|gam :- e : forall alpha . sigma|} \inferrule*[right=inst] { |gam :- e : forall alpha . sigma| } {|gam :- e : sigma [alpha := tau]|} \end{mathpar} \caption{The DHM type system.} \label{fig.intro.dhm} \end{figure} The DHM type system in Figure~\ref{fig.intro.dhm} consists of the following rules of which we mention some details below. Via one or more applications of the \TirName{inst} rule, poly types may be instantiated to mono types by replacing a bound type variable with a mono type. Conversely, via rule \TirName{gen} a poly type can be constructed using quantification over a type variable, provided that the substitution does not capture the free type variables in the environment. \paragraph{Qualified types.} In general, meta variables may be constrained by judgments that by themselves are insufficient to bind a concrete type to a meta variable. As an example, the following type rules encode overloading of the addition operator on both integers and floats. \begin{mathpar} \inferrule*[right=add.int] {} {|gam :- _ + _ ^^ : I -> I -> I|} \inferrule*[right=add.float] {} {|gam :- _ + _ ^^ : F -> F -> F|} \end{mathpar} In the expression |\x . x + x|, the type rules for addition constrain |x| to be a numeric type, but do not dictate whether this type is an integer or a float. At other locations in the derivation tree there may be constraints imposed on the type that make it clear which of the rules applies. During type inference, the type may be insufficiently constrained to resolve the judgment. A typical strategy is to defer the judgment until the end of the inference of the scope in which the judgment arose, which is usually at generalization points. If the type is still not constrained sufficiently, a strategy is to \emph{default} to one of the applicable rules. Another strategy is to encode the judgment as part of the type (if potentially satisfiable) and delay the judgment to all locations where the generalized type is instantiated. Such an encoded judgment is called a qualifier. A qualified type of the expression |\x . x + x| given the above rules is: \begin{code} (\x . x + x) :: forall alpha beta . exists gamma . (gamma :- _ + _ ^^ : alpha -> alpha -> beta) => alpha -> beta \end{code} With additional information this type can be refined. An equivalence between |beta| and |alpha| is deducable from the rules \TirName{add.int} and \TirName{add.float}. Also, suppose that we know further that a qualifier |gamma :- _ + _ ^^ : alpha -> alpha -> alpha| is simplifyable to a qualifier |Num alpha|, the type is refineable to: \begin{code} (\x . x + x) :: forall a . Num a => a -> a \end{code} Haskell's overloading with type classes actually brings such reasoning under the control of the programmer. Such extensions to a type system do not come for free. When the code generation depends on the proof of a deferred judgment, the generated code needs to be parametrized by information that is derived from the deferred proof. Moreover, a function may be given a type with qualifiers that can never be satisfied, which we may only find out when we try to use an identifier with such a type. Some type systems define a coherence relation on qualifiers to formalize the potential satisfaction of constraints. \subsection{Damas-Hindley-Milner Inference} \label{sect.intro.dhm} In this section we consider type inference for the DHM type system. The naive backtracking approach as presented earlier may easily lead to nontermination, because the \TirName{inst} and \TirName{gen} rule can be alternated as each others inverse indefinitely. However, the syntactical restrictions on types permits a more appropriate inference strategy. With some effort, we can deduce that generalization has only an effect for the toplevel expression, or for the expression |e| in a let-expression. It is also sufficient to perform instantiation only after taking the type of an identifier from the environment in the rule \TirName{var}. According to the interpretation of poly types, a type quantified over a variable that does not occur free in the type describes actually the same set of mono types as the type without the quantification, thus we only need to consider types that occur free in the type as generalization candidates. Moreover, the order of the type variables over which is quantified is irrelevant. %format inferw = "\mathcal{W}" Inference algorithm W exploits these properties. It is a sound and complete implementation of the rules, produces most general types, and needs to examine each node of the AST only once. Many actual inference algorithms are based on algorithm W. We describe algorithm W as a monadic function |inferw :: Expr -> Inf Ty| in Figure~\ref{fig.intro.algorithmw}. \begin{figure}[tb] In the description of algorithm W we assume monadic versions of the above functions to form an inference monad |Inf| for the implementation of the DHM type system. Essentially, these monadic functions are wrappers around the above functions: %format ifresh = "\Varid{fresh}_\Varid{I}" %format iunify = "\Varid{unify}_\Varid{I}" %format igen = "\Varid{gen}_\Varid{I}" %format iinst = "\Varid{inst}_\Varid{I}" \begin{code} type Inf = RWST Env Subst Errs -- reader, writer, and state monad ifresh :: Inf Ty -- returns a fresh type iunify :: Ty -> Ty -> Inf () -- unifies two types igen :: Ty -> Inf Scheme -- generalizes a type to a scheme iinst :: Scheme -> Inf Ty -- instantiates a scheme to a type \end{code} The threading of the substitution and collection of error messages is hidden in the monad, as well as the top-down distribution of the environment, so that we define algorithm W as a function |inferw :: Expr -> Inf Ty|: \begin{multicols}{2} \begin{code} inferw ^^ <% x %> = asks (lookup x) >>= iinst inferw ^^ <% f a %> = do r <- ifresh t <- inferw f s <- inferw a iunify t <% s -> r %> return r \end{code} \columnbreak \begin{code} inferw ^^ <% \x . e %> = do t <- ifresh r <- local (insert x t) (inferw e) return <% t -> r %> inferw ^^ <% let x = e in b %> = do t <- inferw e >>= igen local (insert x t) (inferw b) \end{code} \end{multicols} \vspace{-0.6cm} The notation |<% e %>| represents the abstract syntax of an object term |e|. We use this notation to conveniently pattern match and construct object terms in the host language. Note that |<% x %>| is an AST of the type |Expr| and |x| is an AST of the type |Identifier|, but that |<% f a %>|, |f| and |a| represent ASTs of the type |Expr|. \caption{Algorithm W.} \label{fig.intro.algorithmw} \end{figure} Unification plays an important role in algorithm W. In this thesis, we assume the following functions have an efficient implementation~\citep{Dijkstra08efficientfunctional}: \begin{code} fresh :: Subst -> (Ty, Subst) -- returns a fresh type var unify :: Ty -> Ty -> Subst -> (Errs, Subst) -- unify two types (improves subst) generalize :: Env -> Ty -> Subst -> Scheme -- generalize a type in a certain env instantiate :: Scheme -> Subst -> (Ty, Subst) -- instantiate a scheme freshly \end{code} The type |Subst| represents a substitution and a fresh variable supply, which is a mapping from meta variable to a concrete type. The types |Ty| and |Scheme| coincide with |tau| and |sigma| respectively. An environment of type |Env| contains bindings from identifiers to types, and |Errs| is the type of a collection of error messages. Similarly, we assume that there is a type |Expr| and |Identifier| that coincide with the nonterminals |e| and |x| respectively. Figure~\ref{fig.intro.algorithmw} shows Algorithm W, which is a recursive traversal of the |Expr| AST. The combinator |local| applies changes to the environment that are only visible in the monadic computation it encapsulates. The function |insert| adds a binding into the environment, which possibly shadows an already existing binding. The |asks| operator exposes the hidden environment, so that we can use |lookup| to recover the type to which an identifier is bound. \begin{figure}[tb] \begin{code} inferw ^^ <% let x = e in b %> = do t <- ifresh -- start with fresh type for binding local (insert x t) (inferw e >>= iunify t) -- infer type and match against binding s <- igen t -- generalize over unbound variables local (insert x s) (inferw b) -- infer with generalized scheme in env \end{code} \caption{Algorithm W with a recursive let-binding.} \label{fig.intro.algorithmw.rec} \end{figure} The above code assumes a non-recursive let binding. Figure~\ref{fig.intro.algorithmw.rec} shows the algorithm for a recursive let binding. The unification at the end of inference for |e| binds |t| to its actual type. All occurrences of |x| in its own right-hand side must agree with |t|. To prevent having to deal with polymorphic recursion, |x| has a mono-type inside its own binding. The generalized type is only available in the body of the let-expression. The positioning of |local| and |igen| is tricky. Since generalization is performed with respect to free type variables in the environment, the generalization needs to be positioned outside the local environments of the |e| and |b| subexpressions, because the type to generalize occurs in these environments. The monadic formulation of algorithm W is concise because cumbersome flows of environments and substitutions are encapsulated by the monad. However, the abstraction offered by the monad is not always obvious when the object language is more complex. When the object language has pattern-bindings, new bindings arise when visiting the pattern, which means that behavior for the environment is more complex than simply top-down. If multiple derivations are possible, then substitutions may need to be duplicated and merged. The sequencing of operations on the encapsulated state is therefore important. \subsection{Polymorphic Lambda Calculus} When we eliminate the distinction between mono and poly types, and thus allow poly types everywhere, we obtain an implicitly typed version of the \emph{System F} type system. It allows expressions such as |\f . g (f (\x . x)) (f 3)|, where a lambda parameter is applied to values of different types, which is not expressible in the DHM type system. Unfortunately, inference for this system is undecidable~\citep{Wells1999111}, and it does not have most general types. For System F itself, type checking is decidable. However, the syntax is verbose, as type abstraction and type instantiation are explicitly encoded, and types need to be given explicitly for lambda parameters. Figure~\ref{fig.intro.systemf} shows the inference rules of System F, and we discuss some aspects of the rules below. A type application |f sigma| requires |f| to have a universally quantified type, and provides the type |sigma| to instantiate this type with. A type abstraction |/\ alpha . e| quantifies the type of |e| over |alpha|. The syntax of |f| thus dictates when to apply rule \TirName{app.ty} and rule \TirName{abs.ty}. \begin{figure}[tb] \begin{mathpar} \fbox{|gam :- e : sigma|}\\ \inferrule*[right=var] {|(x : sigma) `elem` gam|} {|gam :- x : sigma|} \inferrule*[right=app.e] {|gam :- f : sigma1 -> sigma2|\\\\ |gam :- a : sigma1|} {|gam :- f a : sigma2|} \inferrule*[right=app.ty] {|gam :- f : forall alpha . sigma2|} {|gam :- f sigma1 : sigma2 [alpha := sigma1]|} \inferrule*[right=abs.e] {|gam, x : sigma1 :- e : sigma2|} {|gam :- \x . e : sigma1 -> sigma2|} \inferrule*[right=abs.ty] {|gam :- e : sigma| \\ |alpha `notelem` ftv gam| } {|gam :- /\alpha . e : forall alpha . sigma|} \end{mathpar} \caption{System F.} \label{fig.intro.systemf} \end{figure} Variants of System F are typically used for typed backends of compilers, because of its expressiveness. For example, evidence translation of Haskell type classes may need System F types~\citep{DBLP:journals/jfp/Faxen02}. As frontend, type systems are proposed that are more restrictive than System F, but more liberal than the DHM type system. It is desirable that such systems do not require type annotations for programs that are acceptable by DHM, and allow for a predictable inference. An inference algorithm then infers a type for such programs, and maps these to System F terms. %% These inference algorithms are typically complex, because of various strategies %% involved to determine the structure of derivations trees, and to determine which meta %% variables can be mapped to type variables. %% I don't know myself anymore what I wrote here. \subsection{Discussion} The DHM inference algorithm in Section~\rref{sect.intro.dhm} shows that the mapping between a declarative type system specification and its accompanying inference algorithm is not straightforward. The gap between a complex type system and a sophisticated inference algorithm is even larger. As mentioned above, one of reasons is that an inference algorithm requires information that is not present in type rules. Also, type derivations are typically only treated as a model. Inference algorithms compute types from abstract syntax trees, with the underlying assumption that the combination of type and AST can be turned into a derivation tree. %% This allows certain judgments to be handled in a different %% context than the type rules seem to suggest. Generalization and instantiation, for example, are often dealt with as part of the unification algorithm in order to support on-demand \emph{impredicative instantiation}, which is the instantiation of a type variable with a poly type. In the type rules these are encoded as separate expression rules. A declarative specification is typically a minimalistic lambda calculus to explain particular language features, whereas programming languages are much richer in syntax and language features. Algebraic data types are typically not present in declarative specifications, and neither are mutually recursive let bindings, because these are often regarded as syntactic sugar. Actual programming languages, however, require the presence of such features. We can improve the resemblance between the inference algorithm and declarative specification when the structure of the inference algorithm can be derived from the declarative specification. In this thesis, we propose the use of attribute grammars as the basis for such algorithms. Section~\ref{intro.sect.ags} gives a short introduction. \section{Background on Attribute Grammars} \label{intro.sect.ags} In Section~\ref{intro.sect.typesystems}, we followed the common practice of specifying the semantics of programming languages via relations between properties, where the relations are defined by inference rules and the properties include a source term. \emph{Attribute Grammars} (AGs)~\citep{DBLP:journals/mst/Knuth68} are an alternative approach. An AG specifies the semantics of a language as attributes on nonterminal symbols in the grammar of the language. An AG is a context-free grammar extended with \emph{attributes} and \emph{rules}. We give a definition in Section~\ref{intro.ag.syntax}, in which we describe that attributes are associated with nonterminals, and rules with productions. Given a value for each attribute associated with the nonterminal symbols of a production, the rules specify whether these values are correct. On the other hand, the rules can also be used to compute such attributions for a given AST, which we address in Section~\ref{sect.intro.ag.eval}. In the other subsections we describe common extensions, features and uses of attribute grammars for the purpose of showing why AGs are an attractive language for describing the implementation of compilers, but also to give some background information to which we refer from later chapters of this thesis. A shallow scan through these subsections may be beneficial to the understanding of the other chapters of this thesis. Section~\ref{intro.ag.recent} gives an overview of various systems that provide AGs in various flavors. \subsection{Syntax of Context-Free Grammars and Attribute Grammars} \label{intro.ag.syntax} %format \/ = "\cup{}" We introduce a notation for attribute grammars. In this section, we use the term host language to refer to the language in which the algorithm or compiler is written which we generate from the attribute grammar description. As we see later, functions of the host language may appear in grammar descriptions. We are slightly more formal in this section in comparison to the other sections because in later chapters we introduce various notations for attribute grammars and extensions. Since concepts such as nonterminals, productions, attributes and abstract syntax trees are common to those notations, we introduce these here --- although we actually expect that the reader is already familiar with these definitions. \paragraph{Context-free grammars.} Formally, grammars specify languages, but we also use grammars to describe the structure of tree-like data structures. \citet{Chomsky56} described several classes of grammars with increasing expressiveness and implementation complexity. The class of context-free grammars is particularly convenient for the description of the structure of terms, but is usually not expressive enough to describe the desired correctness properties of programs such as well-typedness. With attribute grammars, context-free grammars are combined with a different formalism to permit the description of such properties. \begin{defi}[Context-free grammar] A \emph{context-free grammar} is a tuple |(V, N, S, P)| where |V| is a set of terminal symbols (the \emph{alphabet}), |N| is a set of nonterminal symbols, |S| is the start symbol with |S `elem` N|, and |P| is a set of productions (defined below). The set |V| and |N| must be disjoint. \end{defi} \begin{defi}[Production] A (context-free) \emph{production} |p = n -> (many m)| is a rewrite rule with nonterminal symbol |n `elem` N|, and a sequence of symbols |(many m) `elem` V \/ N|. The sequence |many m| may be empty. The application of |p| to a sequence of symbols |many s `elem` V \/ N| constitutes to the rewriting of one occurrence of |n| in |many s| to |many m|. In production |p|, |n| forms the left-hand side of the production and |many m| the right-hand side. \end{defi} To summarize, a grammar is a rewriting system where productions specify a rewrite step from a sequence of symbols to sequence of symbols. In a context-free grammar, a production specifies how to rewrite a single nonterminal symbol to a sequence of symbols. The rewriting terminates when only terminal symbols are left: when successive applications of productions to some singleton sequence |n| (with |n `elem` N|) results in a sequence of symbols, then this sequence is derived from |n|. \begin{defi}[Sentence] A \emph{string} is a sequence of symbols. A \emph{sentential form} is a string derivable from the start symbol of the grammar. A \emph{sentence} is a sentential form consisting only of terminal symbols. \end{defi} \begin{defi}[Derivation tree] A derivation tree is a tree |t| that represents how a sentence |many s| is derived from a symbol |m|, and is inductively defined as follows: \begin{itemize} \item A leaf |t| represents either the derivation of the empty string from a nonterminal symbol |n| if there exists a production |p = n -> epsilon|, or the trivial derivation of the singleton string |many s = v| from a terminal symbol |v|. (Only) in the former case, we say that the leaf is associated with the nonterminal |n| and the production |p|. In the latter case, the leaf is only associated with the terminal |v|. \item If trees |t1,...,(sub t k)| represent the respective derivations of sentences |many s1,...,many (sub s k)| from symbols |m1,...,(sub m k)| then the tree |t|, formed by taking |t1,...,(sub t k)| as the respective children of the root, represents the derivation of the sentence |(many s1) ... many (sub s k)| from symbol |n| if there exists a production |p = n -> m1 ... (sub m k)|. We say that the root of |t| is associated with the nonterminal |n| and the production |p|. \end{itemize} \end{defi} \begin{defi}[Syntax tree] A \emph{syntax tree} (or \emph{parse tree}) is a derivation tree that is associated with the nonterminal |n|. This definition purposefully excludes singleton trees denoting a terminal symbol. \end{defi} \begin{defi}[Abstract syntax tree] An abstract syntax tree is the result of applying a projection to some syntax tree. Usually, the resulting tree has less branches (e.g. due to omission of layout) or branches replaced with a more general representation (e.g. desugared). \end{defi} In this thesis, we do not concern ourselves with parsing, which is the process of constructing a syntax tree that represents the derivation of a given sentence. Instead, we assume the syntax tree as a given. In fact, we adopt the common convention to abstract over details in the syntax tree (e.g. whitespace) and work with abstract syntax trees instead. \begin{defi}[Language] The \emph{language} |sub L G| specified by a grammar |G| is the set of sentences that can be derived from the start symbol of |G|. \end{defi} If |sub L G| is a programming language, then the source code of a program written in |sub L G| is a sentence in |L|. Moreover, if |sub L G| is a language of algebraic data types, then nonterminals describe type constructors, terminals describe primitive types, and productions describe data constructors. An AST represents a data structure, and the bit sequence in memory can be regarded as the sentence. \paragraph{Context-free grammar notation.} In Figure~\ref{fig.intro.cfg} we introduce a language for the \emph{description} of context-free grammars: i.e. terms in this language can be interpreted as a grammer as defined above. We later extend the language to describe attribute grammars and some AG extensions. Figure~\ref{fig.intro.stringcfg} shows an example. We explain some aspects of the notation below. \begin{defi}[Meta grammar] When we talk about a grammar for a grammar, we call the former a \emph{meta grammar}. \end{defi} In the notation, a grammar is the composition of grammars for individual nonterminals. The set of terminals |V| and nonterminals |N| are left implicit, and productions |P| are grouped per nonterminal. We reuse these letters for other purposes, such as |N| as an identifier for a nonterminal, and |P| as an identifier for a production. \begin{figure}[tb] %format grammar = "\mathbf{grammar}" %format prod = "\mathbf{prod}" %format term = "\mathbf{term}" %format nonterm = "\mathbf{nonterm}" \begin{code} g ::= grammar N (many p) -- the grammar for nonterminal |N| p ::= prod P (many s) -- production |P|, with as RHS the sequence of symbols |(many s)| s ::= term x :: tau -- a terminal symbol with name |x| and type |tau| | nonterm x : N -- a nonterminal symbol with name |x| and nonterminal |N| N, P, x -- names of nonterminals, productions, and symbols \end{code} \caption{Language to describe context-free grammars.} \label{fig.intro.cfg} \end{figure} The grammar is abstract: instead of terminal symbols, only the \emph{type} of a terminal symbol is given. To stress the difference between terminals and nonterminals, we use a double colon to specify the type of a terminal and a single colon to specify the name of a nonterminal. \begin{defi}[Children] Each symbol in the right-hand side of a production has an explicit name, which will be useful later. We call such named symbols the \emph{children} of the production, which stresses the correspondence to children in the AST of nodes to which the production is associated. \end{defi} In the notation, the nonterminal symbol of the left-hand side of a production is implicit, since we only describe the right-hand sides of productions. We give the symbol on the left-hand side of a production the fixed name |lhs|. \begin{figure}[tb] \begin{code} grammar String -- a cons-list of characters prod Nil -- empty list prod Cons term hd :: Char -- head of the list (|hd| is a terminal) nonterm tl : String -- remainder (|tl| is a nonterminal) \end{code} \caption{Example of a context-free grammar.} \label{fig.intro.stringcfg} \end{figure} \paragraph{Attribute grammars.} We now extend the above formalism to denote context-free grammars with notation for attributes and their associated functions. \begin{defi}[Attribute grammar] An attribute grammar is a tuple |(T, N, S, A, I, O, P, F)|, where the set of terminals |T|, set of nonterminals |N|, and start symbol |S| are defined as for a context-free grammar. The set |A| consists of attribute names. The map |I| associates a set of names |sub I n `subset` A| with each nonterminal |n| in |N|, which make up the \emph{inherited} attributes of |n|. Similarly, the map |O| associates a set |sub O n `subset` A| with each nonterminal |n| in |N|, which makes up the \emph{synthesized} attributes of |n|. For each |n|, the sets |sub I n| and |sub O n| must be disjoined. The productions |p `elem` P| are redefined below. The set |F| consists of computable functions |f| which we call \emph{semantic functions}. %% The mappings |D| associates a set |subsup tau x n| with each |x| in |sub I n| or |sub O n| for %% each |n `elem` N|, which denote the values an attribute can represent. \end{defi} \begin{defi}[Production] An (attribute-grammar) production |p = u -> (many w) `cdot` (many r) `cdot` X| consists of an annotated nonterminal symbol |u| and annotated symbols |many w|, rules |many r|, and a set of symbol names |X|. \end{defi} \begin{defi}[Annotated symbol] An \emph{annotated symbol} is either an annotated terminal or nonterminal symbol. An \emph{annotated nonterminal symbol} |u = x:n.(many a)| is a combination of a distinct symbol name |x `elem` X|, a nonterminal symbol |n `elem` N|, and a collection of attribute names |many a `elem` A| so that |a| is either in |sub I n| or |sub O n|. We say that |n| is associated to |x|. An \emph{annotated terminal symbol} |x:v| is a combination of a distinct symbol name |x `elem` X| and a symbol |v `elem` V|. \end{defi} \begin{defi}[Attribute occurrence] A \emph{reference to an attribute} |x.a| is a combination of a symbol name |x `elem` X| (associated to some nonterminal |n|) with an attribute name |a `elem` A| so that either |a `elem` sub I n| or |a `elem` sub O n|. A \emph{reference to a terminal} |x| is a symbol name |x `elem` X| which is associated to some terminal |v|. An \emph{attribute occurrence} |o| is either a reference to an attribute or a reference to a terminal. \end{defi} We call an occurrence |x.a| also an attribute |a| of |x|. Attribute occurrences can be found in rules, which are defined below. \begin{defi}[Rule] A rule |(many o1) = f (many o2)| of some production |u -> (many w) `cdot` (many r) `cdot` X| consists of a semantic function |f `elem` F| and attribute occurrences |many o1| and |many o2|. The occurrences |many o1| represent the attributes defined by the rule, which are synthesized attributes of |u| or inherited attributes of the children |many w|. The occurrences |many o2| represent the attributes used by the rule, which are the inherited attributes of |u| or synthesized attributes of the children |many w|. In addition, occurrences in |many o2| may also refer to terminals. \end{defi} Due to the restrictions on occurrences it is always clear whether an occurrence references an inherited or a synthesized attribute. In our notation for attribute grammars (further below) we allow the same name to be used for an inherited and a synthesized attribute. \paragraph{Decorated trees.} To give a semantics to rules, we consider syntax trees annotated with attributes, and define how attributes of the tree are related to attribute occurrences, which are mentioned in the rules of productions. \begin{defi}[Attributes of syntax trees] An \emph{annotated syntax tree} or \emph{semantic tree} is a syntax tree |T| where in addition subtrees are associated with the smallest set |Q| defined as follows. Let |t| be a subtree and |n| be the nonterminal that is associated to |t|. For each attribute |a| in |sub I n \/ sub O n|, let there be a distinct attribute symbol |sub q t `elem` Q|. The set |Q| represents the attributes of |T|. \end{defi} The symbols |q| can be seen as instances of the attributes, or as occurrences of the attributes in the tree. This definition states that many trees may be associated with the same nonterminal yet have different instances of the attributes. Moreover, if |t| is an annotated syntax tree, then the attributes of each annotated direct subtree of |t| are a distinct subset of the attributes of |t|. \begin{defi}[Attribute association] Given some annotated syntax tree |t| with associated production |p|, an \emph{attribute association} |alpha| is a mapping so that for each attribute occurrence |o| of |p|, either |alpha o = q| for some |q `elem` Q| when |o| is a reference to an attribute (either of |t| or of a direct subtree of |t|), or |alpha o = v| when |o| is a reference to a terminal child |v| of |t|. \end{defi} For each (node of an) annotated syntax tree, there exists such an attribute association. We leave open how this straightforward connection between attribute occurrences and attributes of the syntax tree is constructed. \begin{defi}[Valuation] A \emph{valuation} |M| is a mapping that associates with each |q `elem` Q| and each |v `elem` V| a value in the host language, which is denoted as |M q| or |M v|. Furthermore, |M v = semb0 v| for |v `elem` V| where |semb0 v| is some encoding of |v| as a value in the host language. These values are called \emph{decorations}. \end{defi} \begin{defi}[Attributed syntax tree] An \emph{attributed} (or \emph{decorated}) syntax tree is an annotated syntax tree combined with a valuation |M|. \end{defi} A rule |(many o1) = f (many o2)| encodes the condition |many (M (alpha o1)) = f (many (M (alpha o2)))| for each node the rule is associated with. Alternatively, we may say that |f| functionally defines occurrences |o1| in terms of occurrences |o2|, and thus that inherited attributes of children are defined by the parent, whereas synthesized attributes of the children are defined by the children and may be used by the parent. \begin{defi}[Correctly attributed syntax tree] A syntax tree is \emph{correctly attributed} when the conditions imposed by the rules are satisfied. \end{defi} \paragraph{Notation for attribute grammars.} The above definitions introduce concepts that underly attribute grammar languages and implementations. Figure~\ref{fig.ag.syntax} gives a minimalistic language for the description of AGs, which is an extension of the language for context-free grammars in Figure~\ref{fig.intro.cfg}. We explain some of its aspects below. \begin{figure}[tb] \begin{code} I ::= attr N (many i) (many s) -- attribute declaration for nonterminal |N| i ::= inh y :: tau -- declares inherited attribute |y| of type |tau| s ::= syn y :: tau -- declares synthesized attribute |y| of type |tau| S ::= sem N (many c) -- semantics definition for |N| c ::= prod P (many r) -- semantics in the form of rules for prod |P| r ::= g [many o1] = f [many o2] -- a rule with pattern |g| and function |f| o ::= t.x.y -- attribute occurrence of child |x| and attribute |y|, and kind |t| t ::= ainh | asyn | aloc -- explicit attribute kind (usually left implicit) x, y ^^ -- names of symbols and attributes f, g ^^ -- expressions \end{code} \caption{Minimalistic language for AGs.} \label{fig.ag.syntax} \end{figure} The notation\footnote{ Note that we reused some letters here which we used before in a different context as the map |I| and the start symbols |S|.} in Figure~\ref{fig.ag.syntax} consists of a collection of nonterminal declarations |g|, attribute declarations |I|, and semantics blocks |S|. Attributes are declared separately for each nonterminal and have a type associated with them. The right-hand side of a rule is an \emph{expression} |f [many o]| in some formal language |H| with embedded references to attributes at identifier positions of |H| via attribute occurrences |many o|. The left-hand side of a rule is also an expression, but limited to patterns such as tuples. The use of expressions is slightly more flexible than just a function symbol. \begin{figure}[tb] \begin{code} attr String -- declares atributes of |String| inh down :: Char -- an inherited attribute |down| syn sh :: String -- a synthesized attribute |sh| sem String -- rules for each production prod Nil lhs.sh = Nil -- rule 1: def. of syn attr of LHS prod Cons tl.down = loc.hd -- rule 2: def. of inh |tl| with term |hd| lhs.sh = Cons lhs.down tl.sh -- rule 3: of syn attr of LHS \end{code} \caption{Example of an AG that shifts a character in a string.} \label{fig.ag.example} \end{figure} For attribute occurrences |t.x.y|, we take the following notational conventions. The attribute kind |t| distinguishes inherited and synthesized attributes. This kind is always clear from the context thus we usually leave it unspecified in examples, unless we want to stress the difference. To refer to an inherited attribute |y| of a symbol named |x|, we use |ainh.x.y| or simply |x.y|. To refer to a synthesized attribute |y| of a symbol named |x|, we use |asyn.x.y| or simply |x.y|. To refer to a terminal named |x| we use the attribute occurrence |aloc.x.self| or simply |x.self| (see also Section~\ref{intro.sect.incremental}). Informally, an AG in this notation is well-formed when the description can be translated to an AG, and that the types of various identifiers and expressions are correct. Figure~\ref{fig.ag.example} shows an example where we define a transformation on strings where each character is shifted one position to the right. We use an inherited attribute |down| to represent the preceding character. Given a value |'d'| as initial |down| value, the result for |"Ag"| is |"dA"|. We show a more complex example in Section~\ref{intro.sect.ag.dhm}. \paragraph{Local attributes.} We also use the notation |aloc.aloc.x| to refer to a \emph{local attribute} with the name |x|, which is an attribute defined by a rule of a production and is only in scope of that production. The name must be distinct from the name of a terminal symbol. Local attributes are typically used to represent common subexpressions. \paragraph{Syntactic sugar.} A terminal |v| can be encoded as a fresh nonterminal (Section~\ref{intro.sect.higherorder}) with a single synthesized attribute and a single |epsilon|-production that defines the attribute with the value |semb0 v|. A local attribute can be encoded as a fresh nonterminal with a single inherited attribute and single synthesized attribute and a single |epsilon|-production that contains a rule that copies the value of the inherited attribute to the synthesized attribute. As convenient simplification, we therefore assume in the remainder of this chapter that terminal-leafs and local attributes are not present in AGs and ASTs, although we use terminals and local attributes in examples. In later chapters we take local attributes and terminals into account explicitly. \paragraph{Interfacing with the AG.} The evaluation of a tree described by the AG (which we discuss in Section~\ref{sect.intro.ag.eval}) takes as input a record of values for the inherited attributes and results in a record with values of synthesized attributes. %if False \begin{code} ast = sem_Cons 'A' $ sem_Cons 'g' $ sem_Nil -- construct tree |ast| inhs = Inh_String { down_Inh_String = 'd' } -- define inh attrs of the root syns = wrap_String ast inhs -- evaluate tree, obtain syn attrs repl = sh_Syn_String syns -- extract syn attr |sh| \end{code} The Haskell functions shown in the above example, which we call \emph{wrapper functions}, are generated from the AG description. The functions |sem_Cons| and |sem_Nil| are constructor functions for the respective productions. Furthermore, |Inh_String| and |Syn_String| are record types with a field for each attribute, such as |down_Inh_String| and |sh_Syn_String|, using a notational convention that guarantees that each field is uniquely labelled. The function |wrap_String| takes the tree and the record for the inherited attributes, and produces a record with synthesized attributes. The use of records is a convenience to interface with the values of attributes independent of their order of appearance. %endif %if False \subsection{Semantics} An AG description states that, given a decorated tree, a decorated tree is correct if the tree is correctly structured according to the context-free grammar and the tree is correctly decorated. For the latter part, note that each node has an associated production, thus there is an association between a node and the production's rules, and an association between a node's decorations and attribute occurrences in a rule. The tree is correctly decorated if for any rule |(many o1) = f (many o2)| that is associated with a node in the decorated tree, |f (many o2)| equals |many o1|, where |many o2| and |many o1| are the associated decorations. An AG description does not specify how the attributes are computed. This is a good property, because it provides the opportunity to define various evaluation algorithms with different properties, and to tune such algorithms, as we will be doing later in this thesis. The following operational semantics provides a simple yet inefficient approach to compute the attributes. Given an AST as initial tree, an evaluator for attribute grammars nondeterministically selects a node and a rule, and applies the rule if the attributes of the right-hand side have been computed. Application of the rule decorates the node with attributes of the left-hand side of the rule. The decorated tree is in normal form when each node is fully decorated. An AG is well-defined if evaluation can reach a normal form for any tree. Section~\ref{sect.intro.ag.eval} shows two common AG evaluation algorithms that follow the above approach in an efficient way. Other evaluation algorithms are based on either one of these. %endif \subsection{Dependency Graphs} \label{sect.intro.ag.depgraphs} To describe how the values of attributes are actually computed, we consider the data dependencies induced by rules. The trees that we consider in this section are derivation trees generated by some AG |(T, N, S, A, I, O, P, F)|. \paragraph{Graphs of a production.} A rule |(many o1) = f (many o2)| represents a data dependency of occurrences |many o1| on occurrences |many o2|, or equivalently, a flow of data from |many o2| via |f| into |many o1|. These dependencies form a graph. \begin{figure}[tb] \begin{code} d ::= o -- attribute occurrence vertex | rule x -- rule vertex with some distinct identifier |x| | child x -- child vertex with |x| the name of the child x -- identifiers \end{code} \caption{Syntax of vertices in a PDG.} \label{fig.intro.ag.designators} \end{figure} \begin{figure}[tb] \begin{center} \begin{tikzpicture} [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\footnotesize} , dep/.style={->,thick} , dep2/.style={->,dashed, very thick} , lab/.style={font=\footnotesize} ] \node[nd](e){|aloc.aloc.hd|}; \node[nd, right of=e, xshift=8mm](f){|rule r2|}; \node[nd, below of=f](a){|ainh.lhs.down|}; \node[nd, right of=f, xshift=8mm](b){|ainh.tl.down|}; \node[nd, right of=b, xshift=20mm](c){|asyn.tl.sh|}; \node[nd, right of=c, xshift=6mm](g){|rule r3|}; \node[nd, right of=g, xshift=8mm](d){|asyn.lhs.sh|}; \node[nd, above of=f](h){|child tl|}; \draw[dep](f) to (e); \draw[dep](b) to (f); %% \draw[dep2](c) to (b); \draw[dep](g) to (c); \draw[dep](g) to [in=0,out=-145] (a); \draw[dep](c) to [in=0,out=145] (h); \draw[dep](d) to (g); \end{tikzpicture} \end{center} \caption{Exemplary PDG of production |Cons|.} \label{fig.intro.ag.pdg} \end{figure} \begin{defi}[Production dependency graph] A \emph{Production Dependency Graph} (PDG) is a directed graph |(V,E)| associated with some production |p|. There is a one-on-one mapping between vertices |d `elem` V| (Figure~\ref{fig.intro.ag.designators}), and the nonterminal children of |p|, the rules of |p| and attribute occurrences in rules of |p|. The edges |E| consists of: \begin{itemize} \item For each vertex |asyn.x.y| an edge to a vertex |child x|. \item For each rule |[many o1] = f [many o2]| (represented as vertex |rule r|) an edge from |rule r| to vertex |o| for each |o `elem` many o2|, and an edge from vertex |o| to |rule r| for each |o `elem` many o1|. \end{itemize} Similarly, a \emph{production data-flow graph} is a production dependency graph with the edges reversed. \end{defi} Figure~\ref{fig.intro.ag.pdg} shows the PDG of the production |Cons| of Figure~\ref{fig.ag.example}. For simplicity, we modelled the terminal |hd| as a local attribute |aloc.aloc.hd|. \paragraph{Graph of a tree.} These graphs can be projected on each node of a tree and then combined to form a dependency graph for a tree, or a data-flow graph for a tree. The general idea is that we take the PDG of the root of the tree and then add the graph for each child of the root, which describe the dependencies of synthesized attributes of the child on inherited attributes of the child. \begin{defi}[Tree dependency graph] For some annotated syntax tree |t| with associated production |p|, attribute association |alpha|, and annotated subtrees |t1,...,sub t k| with corresponding nonterminal children |c1,...,sub c k| of |p|, the \emph{tree dependency graph} is inductively defined as the union of the PDGs of |t1,...,sub t k| and the instantiation of the PDG of |p| by transforming (with preservation of edges) rule and child vertices to fresh vertices and each occurrence vertex |o| to vertex |alpha o|. Similarly, a \emph{tree data-flow graph} is a tree dependency graph with the edges reversed. \end{defi} Given a tree, evaluation algorithms of AGs (Section~\ref{sect.intro.ag.eval}) are traversals over the tree dependency graph expressible as tree traversals\footnote{ The shape of a tree dependency graph ensures that a description with a tree traversal is possible. The dependencies are properly nested: on a path from a synthesized attribute of a child to some (indirect) dependency, an inherited attribute of the child occurs before any attribute of a sibling or parent. } over the tree that are described by a nondeterministic tree-walking automaton (Section~\ref{sect.treewalkautomaton}). Traditionally, such traversals may be demand-driven (traversing the dependency graph based on which attributes are needed) or may be described by a deterministic tree-walking automaton. We come back to evaluation algorithms in Section~\ref{sect.intro.ag.eval}. We first consider static approximations of the dependency graphs which can be used to prove that none of the attributes have a cyclic definition, and are an essential ingredient for a description of the evaluation with a \emph{deterministic} tree-walking automaton. \paragraph{Approximations.} Below, we consider abstract interpretations of AGs that construct dependency graphs that are a static approximation of the tree dependency graphs of collections of trees in certain contexts: \begin{defi}[Context] A \emph{context} is a symbol |C| of some fixed set of symbols |Omega| given per application and grammar. A context represents an additional set of invariants imposed on a tree. \end{defi} Concretely, the invariants represented by a context may include that: \begin{itemize} \item The tree is associated with a certain nonterminal or production; \item The tree occurs as a subtree at certain position of a parent; \item The tree has attributes that are used according to some protocol~\citep{Farrow:1984:SAG:502949.502881}. \end{itemize} A collection of trees in a context share a common structure. By distinguishing contexts, we may consider projections of the tree dependency graphs of a collection of trees on the common attributes (of the root) that each tree in the collection has, so that the edges are superset of the projected edges of each individual tree. The projection-operation distributes over graph union, which ensures that we can work with approximations of projections of graphs of subtrees. \begin{defi}[I/O graph] An \emph{I/O graph} of some nonterminal |n `elem` N| is directed graph where the vertices consist of the attributes |a `elem` sub I n \/ sub O n| and the edges represent either (indirect) dependencies or data flow between attributes of some trees that have |n| as root. An \emph{I/O dependency graph} is an \emph{I/O graph} where the edges represent data dependencies, and in an \emph{I/O flow graph} the edges represent data flow. \end{defi} Note that the above definition does not specify which edges are included in an I/O graph, but only specifies what the edges represent. We later give a consistency condition that specifies which edges must minimally be present. \begin{defi}[Nonterminal dependency graph] The \emph{Nonterminal Dependency Graph} (NDG) of nonterminal |n| is the I/O dependency graph of |n| that approximates the dependencies between attributes of any \emph{tree} associated with |n|. \end{defi} An I/O graph of some nonterminal |n| serves as an approximation of the dependencies between attributes of \emph{some} trees (context dependent) that are associated with |n|. In contrast, an NDG is a single, context-independent approximation of dependencies between attributes of some nonterminal. \begin{defi}[Augmented PDG] The \emph{augmented PDG} of a PDG |G| of some production |p| and I/O graphs of the nonterminal children of the production, is |G| with edges added between attributes of children so that if there exists an edge between attributes of the given I/O graph of some child, then this edge also exists between the attributes of the child in the augmented PDG, and vice versa\footnote{We specify here that the dependencies between attributes of a child in an augmented PDG match exactly to the dependencies between the attributes of the corresponding I/O graph that the augmented PDG is parameterized with, thus that edges of the PDG may impose constraints on the I/O graphs of the children.}. %% In actual implementations, %% we usually require only that the I/O graph is a subgraph such that an augmented PDG is the %% instantiation of a PDG with I/O graphs. \end{defi} An augmented PDG of some production |p| serves as an approximation of the dependences between attributes of \emph{some} trees that are associated with |p|. It is a PDG parameterized with the dependencies induced by the subtrees. \paragraph{Collections of graphs.} To specify which edges are part of the I/O graphs and augmented PDGs we consider some properties of collections of these graphs. \begin{defi}[Consistent approximations] A collection of I/O graphs and augmented PDGs is \emph{consistent} when each I/O graph of some nonterminal |n| in some context $\mathcal{C}$ is a projection of the union of the augmented PDGs in context $\mathcal{C}$ of the productions associated with |n|. \end{defi} \begin{defi}[Complete approximations] A collection of I/O graphs and augmented PDGs is \emph{complete} when for each production |p| the collection includes an augmented PDG and corresponding I/O graphs for each combination of contexts that the children of |p| can occur in. \end{defi} \begin{defi}[Smaller approximations] Graph |A| is \emph{smaller} than graph |B| if |sup A (*)| is a subgraph of |sup B (*)|, where |sup A (*)| and |sup B (*)| are the respective transitively closed graphs of |A| and |B|. \end{defi} \begin{figure}[tb] \begin{center} \begin{tikzpicture} [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\footnotesize} , dep/.style={->,thick} , dep2/.style={->,dashed, very thick} , lab/.style={font=\footnotesize} ] \node[nd](dn){|ainh.down|}; \node[nd, right of=dn, xshift=12mm](sh){|asyn.sh|}; \draw[dep2](sh) to [in=-30,out=-150] (dn); \end{tikzpicture} \end{center} \caption{An I/O graph of nonterminal |String|.} \label{fig.intro.iograph} \end{figure} Figure~\ref{fig.intro.iograph} gives an example of an I/O graph that is part of some collection that satisfies the above properties based on the AG in Figure~\ref{fig.ag.example}. It is an I/O graph of the nonterminal |String| in the context of being the child |tl| of production |Cons|. The dependency from |asyn.sh| on |ainh.down| is induced by the production |Cons|. If Figure~\ref{fig.intro.ag.pdg} would include an edge from |asyn.tl.sh| to |ainh.tl.down|, it would be an augmented PDG parameterized with the I/O graph. \paragraph{Cycle analysis.} Cycle analysis of AGs is an abstract interpretation~\citep{Nielson:1999:PPA:555142} that approximates the tree dependency graph of any tree by constructing a complete and consistent collection of I/O graphs and augmented PDGs. The consistency and completeness requirements lead to a set of mutually recursive equations for which we want to compute a least solution. Such a solution is obtained with fixpoint iteration. In Chapter~\tref{chapt.warren} we consider a concrete cycle analysis; here we look only at the general structure of such analyses. Various approaches distinguish different contexts which influence the accuracy and complexity of the approximations: \begin{description} \item[Uniform AG.] \citet{DBLP:journals/mst/Knuth68} distinguishes as contexts the nonterminals to which trees are associated, and the different positions in the right-hand sides of productions where nonterminals can occur in. This approach\footnote{ The actual approach does not distinguish different positions but instead instantiates augmented PDGs with I/O graphs (thus copies them). } leads to one I/O graph per nonterminal and one augmented PDG per production. The AG is \emph{uniform} or \emph{absolutely non-circular} if the augmented PDGs in this flavor are acyclic. \item[Well-defined AG.] \citet{springerlink:10.1007/BF01702865} additionally distinguishes the production to which the tree is associated. This approach leads to one I/O graph per nonterminal and production, and an exponential number of augmented PDG, one for each combination of children with productions. An AG is \emph{well-defined} if for any tree the tree dependency graph is cycle-free, which is the case when the augmented PDGs in this flavor are acyclic. \item[Ordered AG.] \citet{DBLP:journals/acta/Kastens80} distinguishes as contexts only the nonterminals to which trees are associated, but does not differentiate the occurrences of nonterminals in the right-hand sides of productions. \end{description} In our experience with UUAG and in agreement with observations by \citet{DBLP:journals/acta/RaihaS82}, an AG is in practice also ordered when it is well-defined. \subsection{Tree-Walking Automata} \label{sect.treewalkautomaton} Tree-walking automata arose from tree language theory and were introduced by \citet{Aho:1969:TCF:800169.805425}. A tree-walking automaton (TWA) is device that walks over a tree in a contiguous manner and is accompanied by a state machine that describes how the nodes of the tree change their state upon each visit and whether the device goes up to the parent or goes down to one of the children as the next step. Section~\ref{sect.intro.ag.eval} uses TWAs to describe evaluation algorithms of AGs. We keep here a simplistic presentation; there exist many extensions that increase the expressiveness of these automata, such as pebbles~\citep{Engelfriet99tree-walkingpebble}. \begin{defi}[Tree-walking automaton] A TWA for some AST is a tuple |(V,Q,I,F,delta)|, with an alphabet |V| of node labels, a finite set of states |Q|, an initial state |I `elem` Q|, a set of final states |F|, and a transition relation |delta `subset` (V `times` Q `times` Q `times` C `times` Q)|, where |C = { up, sub down 0, ..., sub down k }| is a set of commands and |k| is the maximum branching factor of nodes in the AST. There exists a one-to-one relation between productions and symbols in |V| so that each node is labelled with a |v `elem` V| depending on the production associated with the node. \end{defi} A tuple |(v,q,q0,c,q') `elem` delta| represents a transition from a node labelled |v| in state |q| to a state |q'|, and moving to the node according to |c|. The automaton keeps track of a bit of history: |q0| is the state of the node that caused the transition to the current node. \begin{defi}[Deterministic tree-walking automaton] A \emph{deterministic TWA} is a TWA where the transition relation is a \emph{function} |delta :: V -> Q -> Q -> (C, Q)|. Otherwise the TWA is \emph{nondeterministic}. \end{defi} \paragraph{Acceptance.} Initially each node in the AST is associated with the initial state |I|. The automaton starts at the root of the tree and stops if no step can be taken anymore. The tree is accepted if the automaton ends with the root having an associated state in |F|. With each step, the automaton visits a node. If the automaton is at a node with label |v| and associated state |q|, and previously visited a node in state |q0|, then the automaton chooses a step |c| and new state |q'| so that |(v,q,q0,c,q') `elem` delta|, or the automaton stops if no such step exists. In the former case, the automaton updates the state of the node to |q'| and visits the parent if |c=up| or visits child |i| if |c = sub down i|. \paragraph{Evaluation of rules.} An actual AG evaluation algorithm does not only traverse the tree but also needs to apply rules to compute attributes. Thus, in an actual implementation, the automaton also applies a subset |gamma(v, q)| of the production's rules upon making a transition to state |q| at a node with label |v|. \begin{defi}{Visit} The TWA \emph{visits} a node |n| if it arrives at |n| and executes |gamma(v, q)| where |v| is the label of |n| and |q| is the state of |n|. \end{defi} \paragraph{Implementation with the Zipper.} Various forms of deterministic TWAs can be implemented in a purely functional programming language using the \emph{zippers}~\citep{Huet:1997:ZIP:969867.969872}. Such an approach models the imperative updates of the automaton to the state. \begin{figure}[tb] \begin{code} p ::= plans P : N (many v) -- execution plan for production |P| of nonterminal |N| v ::= visit C (many i) (many s) (many b) -- a visit in context |C| that may need |many i| and can produce |many s| b ::= r -- evaluation rule | invoke x C (many i) (many s) -- a |down x| in context |C|, which can provide |many i| to |x| and expects |many s| x -- child name C -- context identifier i,s,r -- as defined in Figure~\ref{fig.ag.syntax} \end{code} \caption{The execution plan language.} \label{fig.ag.executionplan} \end{figure} \begin{figure}[tb] \begin{code} plans Nil : String visit ^^^ AnyCtx ^^^ inh down :: Char ^^^ syn sh :: String lhs.sh = Nil plans Cons : String visit ^^^ AnyCtx ^^^ inh down :: Char ^^^ syn sh :: String tl.down = loc.hd invoke tl AnyCtx ^^^ inh down :: Char ^^^ syn sh :: String lhs.sh = Cons lhs.down tl.sh \end{code} \caption{Exemplary execution plans of nonterminal |String|.} \label{fig.ag.execplanexample} \end{figure} \paragraph{Execution plans.} Figure~\ref{fig.ag.executionplan} introduces a language of execution plans |many p| for the description of the transition relation of TWAs, of which we explain some aspects below. Figure~\ref{fig.ag.execplanexample} shows an example. The language is not expressive enough to describe all transition relations, but it suffices for a description of an AGs evaluation. A collection of plans-blocks represents the transition \emph{relation} |delta|. Since a context is an agreement between parent and child, a context models the |q0| parameter of |delta|. A plans-block is associated to a unique production |P| and consists of a number of visit-blocks. A visit-block |v| describes visits to the node in context |C|, and thus represents a subset |subsup delta (sub P C) v| of |sub delta (sub P C)|. Let |many r| be the rules that can be evaluated as a consequence of the tree walk taking transitions from |subsup delta P v|. Then the inherited attributes |many i| of the current node may be needed in the evaluation of |many r| and the synthesized attributes |many s| can be computed by |many r|. An invoke-rule represents possible transitions to some child |x| of |P| in some context |C| so that values of inherited attributes |many i| of |x| can be provided, and values of synthesized attributes |many s| of |x| may be needed by the current node. Given an acyclic PDG of P, the relation |sub delta P| and the accompanying subset of |gamma| can be generated. This procedure is left as an exercise to the reader; we note that the number of states is possibly exponential in the size of the productions and their children, and the order of appearance of rules is irrelevant for the translation. A collection of plain-blocks may represent also a transition \emph{function} |delta|. In this case, each visit-block |v| represents a distinct |subsup delta (sub P C) v `subset` sub delta (sub P C)| with precisely |1+n| elements (of which element |i+1| can be thought of as the continuation after visiting child number |i|), where |n| is the number of invoke-rules in a visit-block. The rules must occur in define-before-use order. The example in Figure~\ref{fig.ag.execplanexample} satisfies these constraints. \paragraph{Implementations of execution plans.} In comparison to the Zipper, there are less `imperative' and more efficient encodings of TWAs in functional languages. If we consider TWAs with transition relations that do not make use of the |q0| parameter, and thus do not distinguish contexts, \citet{DBLP:conf/ifip2-4/SwierstraA98} presented an approach by exploiting lazyness (for nondeterministic TWAs) which we sketch in Section~\ref{sect.intro.ag.eval}, and \citet{Saraiva99} presented visit functions, which are coroutines encoded as continuations (for deterministic TWAs with a total order imposed on visits which we sketch in Section~\ref{sect.ag.eval.ordered}). \paragraph{Implementations described in this thesis.} In Chapter~\tref{chapt.side-effect} we build upon visit functions. In Chapter~\tref{chapt.warren} we show that we can represent transition relations that use contexts. In later chapters we look at extensions to AGs that have accompanying evaluation algorithms that cannot be described with a deterministic TWA. For example, in Chapter~\tref{chapt.iterative} we allow tree walks directed by values of attributes, and in Chapter~\tref{chapt.breadthfirst} we allow tree walks to jump back and forth saved positions. \subsection{Demand-driven Attribute Evaluation} \label{sect.intro.ag.eval} %if False %% ON-DEMAND EVAL On-demand evaluation. Trivial mapping to an execution plan for a nondeterministic tree-walking automaton. Show that a straightforward mapping to a lazy functional language is possible by encoding the graph as a cyclic functional program, and letting the evaluator of the functional language act as tree walking automaton %endif We can express demand-driven attribute evaluation by mapping an AG onto an execution plan that describes a nondeterministic TWA (Section~\ref{sect.treewalkautomaton}). This translation is straightforward: \begin{itemize} \item From each production an execution plan is derived with a single visit that lists all attributes of the productions left-hand side. \item The visit-block contains the rules of the production, and an invoke-rule per child which lists all the attributes of the child. \end{itemize} The attributes are then computed by running the described TWA. An execution plan in the above form has a straightforward translation to algebras Haskell. We sketch this translation. It is optional background material, but is not required for the understanding of later chapters. \paragraph{Catamorphisms.} Let |F| be an endofunctor so that data constructors describing the AST |A| form the initial |F|-algebra. The execution plans can be mapped straightforwardly to an |F|-algebra |phi| so that |cata phi A| is a function |g| (which we call the \emph{semantic result} or \emph{semantic tree}) that takes an argument for each inherited attribute of the root of |A|, and provides a result for each synthesized attribute of the root. Function |g| encodes the tree dependency graph of |ast| (Section~\ref{sect.intro.ag.depgraphs}), and lazy evaluation acts as nondeterministic TWA, with the additional feature that each rule in the execution plan is at most executed once. Effectively, the functional program is a term-graph representation of the dependency graph, and evaluation, rewriting this term-graph~\citep{DBLP:conf/csl/EekelenSP96}, results in the values of synthesized attributes. \paragraph{Haskell translation.} \citet{DBLP:conf/ifip2-4/SwierstraA98} showed how to express the algebra in Haskell as a function (called a \emph{semantic function}\footnote{ Depending on the context, a semantic function may be the function in a productions rule or a function corresponding to some data structure in an algebra. }). We demonstrate this approach based on the example in Figure~\ref{fig.ag.example}. Figure~\ref{fig.ag.algebrasketch} gives a sketch of the translation to Haskell code. We explain some aspects of this example below. \begin{figure}[tb] \begin{code} type T_String = Char -> String -- type of the node's semantics cata_String :: String -> T_String -- maps AST to semantic AST cata_String Nil = sem_Nil cata_String (Cons hd tl) = sem_Cons hd (cata_String tl) sem_Nil :: T_String -- production without children sem_Nil lhs_down = (lhs_sh) where ... -- note that the |T|-type is a function sem_Cons :: Char -> T_String -> T_String -- production with two children sem_Cons loc_hd loc_tl lhs_down = (lhs_sh) where ... \end{code} \caption{Sketch of the algebra.} \label{fig.ag.algebrasketch} \end{figure} For each nonterminal |N| a type |T_(brackb N)| for the semantic result of a tree associated with |N| is generated, which is |T_String| in the example. Nonterminal |String| has an inherited attribute of type |Char| and a synthesized attribute of type |String|, hence the type of |T_String| is |Char -> String|. The |cata|-function associates a semantics with each constructor of the AST. It replaces a constructor |P| with its semantic variant |sem_(brackb P)|. \begin{figure}[tb] \begin{code} sem_Cons loc_hd loc_tl = \ lhs_down -> let tl_down = loc_hd -- transcription of the first rule (tl_sh) = loc_tl tl_down -- recursive call to child |tl| lhs_sh = Cons lhs_down tl_sh -- transcription of the second rule in ( lhs_sh ) \end{code} \caption{The body of |sem_Cons|.} \label{fig.ag.bodyofcons} \end{figure} \paragraph{Semantic functions.} Figure~\ref{fig.ag.bodyofcons} sketches the generated semantic function |sem_Cons|. The body of a semantic function encodes the productions rules and the invoke-rules as given in the execution plan. The encoding of the rules is straightforward. An invoke-rule with some child |tl| is translated to a recursive call to the parameter that represents the semantic tree of |tl|. The call is parameterized with values of the inherited attributes of |tl|, and a pattern match against the results extracts the values of the synthesized attribute of |tl|. \paragraph{Remarks.} Demand-driven evaluation of AGs is popular in current AG systems. An advantage of on-demand evaluation is that it does not require an abstract interpretation as part of its implementation, and works in combination with extensions such as remote reference attributes~\citep{DBLP:journals/scp/MagnussonH07}. Another advantage is that attributes are not computed when their values are not needed at runtime. Demand-driven evaluation may produce values of attributes even in the presence of cyclic attribute dependencies. A disadvantage of demand-driven evaluation is the potential high space requirements also known as space leaks. The approach based on lazy evaluation goes further than demand-driven attribute evaluation because the set of required attributes depends on how the values of attributes are inspected. In fact, using lazy lists it is possible to associate countable infinite attributes with a nonterminal. \subsection{Statically Ordered Attribute Evaluation} \label{sect.ag.eval.ordered} %if False %% GREEDY EVAL Greedy evaluation. Explain what it is: evaluation described by a deterministic automaton which may evaluate more than strictly required. Kastens: deterministic TWA without contexts. Kennedy-warren: deterministic TWA with contexts. %endif We can express a statically ordered evaluation of AGs by mapping the AG onto a deterministic TWA (Section~\ref{sect.treewalkautomaton}). Depending on what contexts are distinguished and the strictness properties of the rules, there may be values computed for attributes that are not needed for the result, or which are only needed later. Therefore, we will call this also eager, greedy or strict evaluation of AGs. \begin{defi}[Multi-visit AG] A \emph{multi-visit AG} is an AG for which a statically ordered evaluation strategy is possible. \end{defi} Multi-visit AGs play an important role in this thesis because they make the notion of phasing explicit, which is useful for reasoning about what parts of the tree have been investigated and constructed so far. \paragraph{Attribute scheduling.} To map an AG to an execution plan of a deterministic TWA, we need to determine for each production |P| how trees are visited that are associated to |P|. For each visit, we need to determine which rules to apply and how the children of |P| are visited. This process requires acyclic augmented PDGs. \paragraph{Scheduling algorithms.} \citet{DBLP:journals/acta/Kastens80} presented an approach that attempts to derive a smallest single sequence |sub Delta N| of visits per nonterminal |N| so that the attributes of a child with some nonterminal |N| can be computed by visiting the child according to some prefix of |sub Delta N|. Effectively, this approach does not distinguish any contexts. The approach entails adding edges to the I/O graphs of |N| so they are equal and form a total order on the attributes, which is possible for most AGs in practice, but the approach of \citeauthor{DBLP:journals/acta/Kastens80} sometimes needs help in the form of additional attribute dependencies to accomplish this. \citet{DBLP:conf/popl/KennedyW76} presented an approach that works for any absolutely non-circular AG. Their approach distinguishes protocols as contexts, which are the possible orders in which the parent provides inherited attributes and demands synthesized attributes. In Chapter~\tref{chapt.warren} we investigate this approach in-depth and present a translation to Haskell. \paragraph{Coroutines.} A deterministic TWA can be implemented with coroutines~\citep{Warren76}. \begin{defi}[Coroutine] A \emph{coroutine}~\citep{DBLP:books/sp/Marlin80} is a function that can pause during its execution and return results to the caller. It may be parameterized with additional arguments when resumed by the caller. A \emph{generator} is a coroutine that does not take additional arguments. \end{defi} When no contexts are distinguished, such coroutines can be encoded in a purely functional language as \emph{visit functions}~\citep{Saraiva99}. The body of such a function builds a continuation that is used for a subsequent call and returns it as part of the result. From the caller's perspective: %format f_visit1 %format f_visit2 %format f_visit3 %format lhs_visit1 %format lhs_visit2 %format lhs_visit3 %format s_child1 %format s_child2 %format asyn1 %format asyn2 %format asyn3 %format asyn4 %format ainh1 %format ainh2 %format ainh3 %format ainh4 %format f_syn1 %format f_syn2 %format f_syn3 %format f_syn4 %format f_inh1 %format f_inh2 %format f_inh3 %format f_inh4 \begin{code} ... f_visit1 = s_f -- assuming |s_f| is the coroutine of a child |f| (f_syn1, f_syn2, f_visit2) = f_visit1 f_inh1 f_inh2 (f_syn3, f_syn4, f_visit3) = f_visit2 f_inh2 f_inh2 ... \end{code} The callee has the following structure where the dots represent the usual encoding of the rules that are scheduled to a particular visit: \begin{code} sem_(brackb P)_1 s_child1 s_child2 = lhs_visit1 where lhs_visit1 ainh1 ... ainh2 = (asyn1, ..., asyn2, lhs_visit2) where ... lhs_visit2 ainh3 ... ainh4 = (asyn3, ..., asyn4, lhs_visit3) where ... \end{code} We explain this encoding in great detail throughout this thesis. Many chapters of this thesis present variants of this encoding. \paragraph{Remarks.} Historically, statically ordered attribute evaluation results in faster code and less memory usage. Also, recent developments on multi-core computing may give renewed interest in visit sequences with respect to parallel evaluation~\citep{springerlink:10.1007/BF02948394}. \subsection{Incremental Descriptions} \label{intro.sect.incremental} Our formalism allows us to write various declarations that together form an AG in any order. This is a consequence of the purely functional relation between attributes. It is an important property of AGs, because it allows us to incrementally and separately describe AGs. In this thesis, we make repeatedly use of this feature to eliminate common patterns from examples. The separate descriptions are simply be merged by string concatenation. \paragraph{Incremental notation.} An AG description is incremental: nonterminals, productions, children, attributes and rules may be declared separately. At the same time, we may declare productions and attributes for multiple nonterminals, and children and rules for multiple productions. \begin{figure}[tb] \begin{code} grammar Expr -- productions for |Expr| prod App term impred :: Bool -- with terminal for |App| prod Var term nm :: Name -- with terminal for |App| prod App Var term uid :: Int -- extra terminal for |App| and |Var| grammar Expr Type -- productions for multiple nonterminals prod App nonterm f,a : self -- multiple nonterminals of the same type attr Expr inh x,y :: Int -- multiple attributes of the same type attr Expr Type syn output :: self -- attributes for multiple nonterminals sem Expr Type -- rules for multiple nonterminals prod App lhs.output = f.output `mkApp` a.output sem Expr prod App Var lhs.x = 1 -- rule for multiple productions \end{code} \caption{Examples of incremental notation.} \label{fig.ag.incrementalexample} \end{figure} Figure~\ref{fig.ag.incrementalexample} gives a number of examples. A declaration of a nonterminal and production may appear multiple times and provide additional declarations. Nonterminal |Expr| has only one production |App| but its contents are determined by several declarations. The type |self| is special and represents the type of the actual nonterminal of the description the |self| appears in. The function |mkApp| must thus be an overloaded function that works both on |Expr|s and on |Type|s. The merging process is straightforward\footnote{ The interested reader may take a look at @Transform.ag@ in the @uuagc@ project for a merge algorithm. This algorithm also allows some declarations to overwrite previous declarations. }. A duplicate declaration of an attribute of the same nonterminal is not allowed and considered a static error. Similarly, after merging, attributes may not be defined by more than one rule, and each child of a production must be defined once. \paragraph{Nonterminal sets.} To aid the definition of attributes on many nonterminals, we may use nonterminal sets. We use nonterminal sets often in actual code but only sporadically in this thesis. \begin{defi}[Nonterminal set] A \emph{nonterminal set} is a nonterminal name that represents one or more other nonterminals or sets. \end{defi} For example, we define a nonterminal name |AllExpr|, which actually stands for |Expr| and |Decl|. When we declare attributes on |AllExpr|, these are actually declared for |Expr| and |Decl|: \begin{code} set AllExpr : Expr Decl -- |AllExpr| includes |Expr `union` Decl|. \end{code} Nonterminal sets are extensible: a set declaration of some set |N| may appear multiple times in an AG description. Additionally, notation for set union and set difference may be used to define sets. Determining sets is a straightforward fixpoint computation. \paragraph{Nonterminal inheritance.} A nonterminal may masquerade as a set. If due to a set declaration, such a set includes other sets and nonterminals, the nonterminal inherits their attributes, productions, and rules. \subsection{Higher-Order Children and Attributes} \label{intro.sect.higherorder} \begin{defi}[Semantics of a child] The \emph{semantics of a child} with a nonterminal |N| is a semantic tree associated to nonterminal |N|. The decorations still have to be given. \end{defi} \begin{defi}[Higher-order child] A \emph{higher-order child} is a child with a semantics determined by the value of an attribute. \end{defi} A conventional child is determined by syntax, whereas a higher-order child is determined by an attribute. Higher-order children are also known as higher-order attributes or nonterminal attributes. The notion `higher order' originates from being able to pass the semantics of children around as first class values. \emph{Higher-Order} AGs (HOAGs)~\citep{DBLP:conf/pldi/VogtSK89} support higher-order children. As part of this thesis, we implemented this feature in UUAG~\citep{uuagc}\footnote{ The syntax that we use here deviates slightly from the actual syntax of higher-order children in UUAG. A type signature |inst.x : N| declares a child |x|, and a conventional rule must define the attribute |inst.x| with the semantics for |x|. In addition, when |x| already exists, its definition is a function that transforms the original semantics of |x|. }. Higher-order children play an essential role in this thesis: with such children we can dynamically grow the tree (e.g. a proof tree) instead of being limited to a fixed tree (e.g. the parse tree). \paragraph{Children defined by rules.} In a conventional AG, the semantics of a child of a production is determined prior to attribute evaluation. In a Higher-Order AG (HOAG), additional higher-order children may be declared for a production. Their semantics is the value of an attribute, or alternatively, the outcome of evaluating a rule: \begin{code} child x : N = f [many a] -- rule that introduces a child |x| \end{code} The expression |f [many a]| evaluates to the semantics for |x| as the following example demonstrates: \begin{code} child x : String = sem_Nil -- declares a child |x| that is defined by |sem_Nil| x.down = 'z' -- inherited attr of |x| \end{code} \paragraph{Implementation.} We see in later chapters how child-rules can be implemented. In the translation to Haskell as sketched in Section~\ref{sect.intro.ag.eval}, the semantics of a child is a function from inherited to synthesized attributes, and each child is translated to a function call. A child-rule in this section is a conventional evaluation rule that defines some local attribute, where the local attribute is used as the function to call: \begin{code} loc_x = sem_Nil -- local attr determines the semantics of |x| x_down = 'z' -- defines inherited attr of |x| x_sh = loc_x x_down -- call to child |x| \end{code} \paragraph{Desugaring.} HOAGs can be used to desugar an AST. As example of an HOAG, suppose that |String| has a special production |Single| for single-character strings. Instead of defining the semantics directly for |Single|, we add a child |repl| that represents the string in terms of |Cons| and |Nil|: \begin{code} grammar String prod Single ^^^ term x :: Char -- the |Single| production sem String prod Single -- and its semantics child repl : String = sem_Cons loc.x sem_Nil -- higher-order child repl.down = lhs.down -- inh attr of child |repl| lhs.sh = repl.sh -- syn attr of child |repl| \end{code} \paragraph{Multi-visit AGs as HOAGs.} %format P1 %format N1 In a multi-visit AG, a child may be visited multiple times to compute some of the attributes. Such an AG can be encoded as a HOAG, which we show below. As we see in later chapters, we worked on a core language that can represent such AGs, and the question arose whether to use HOAGs as a target language. We did not do this because of other requirements, but we present the translation anyway since it may give some insight in how we organized the visit functions earlier. We may encode multiple visits to some child |c| as a single visit to child |c| that only requires the inherited attributes of the first visit and only provides the synthesized attributes of the first visit. Additionally, it produces an attribute |c.cont| that represents the semantics of |c| after the visit. For the second visit, we use a higher-order child |c2| with |c.cont| as semantics. We then visit |c2| to provide/obtain the attributes of the second visit. For the next visit we use |c2.cont|, etc. This approach requires the introduction of a potential large number of new nonterminals. For a nonterminal |N| with |m| visits, we introduce the nonterminals |N1, ..., (sub N m)|, such that each nonterminal |(sub N i)| has the inherited and synthesized attributes as associated to visit |i| of nonterminal |N|. In addition, |(sub N i)| (for |i < m|), has an extra synthesized \emph{continuation} attribute |cont| that contains the semantics of |(sub N (i+1))|. For a production |P| we introduce the productions |P1, ..., (sub P m)|. The terminals of production |sub P i| represent the decorations as available prior to visit |i|. Thus, |P1| consists of the original terminals and nonterminals of |P|, and |(sub P i)| (for |i > 1|) consists of the terminals of |sub P (i-1)| and additionally has terminals which encode the attributes computed in visit |i|. As optimization the terminals that are not needed in later visits can be omitted from |sub P i|. The semantics for |(sub P i)| consists of the rules in |P|'s plan for visit |i| (modulo renaming of attribute references). Additionally, a rule is added which computes the semantics of |sub P (i+1)| and stores it in attribute |lhs.cont|. In this translation, a visit to a child (for |i > 1|) is thus represented as a higher-order child that is instantiated by the continuation attribute produced by the previous visit. \subsection{Circular Reference Attributes} \label{intro.sect.referenceattrs} Reference Attributed AGs (RAGs) are an extension of attribute grammars with (remote) reference attributes~\citep{DBLP:journals/scp/MagnussonH07}. This is a common extension of AGs that utilize a demand-driven evaluation algorithm. The extension allows subtrees to be passed around in attributes. Normally, the attributes of such a subtree |T| can be used and defined only by the direct parent of |T|. However, with the extension, the attributes of |T| may be used and defined by any rule holding a reference to |T|. Such an attribute is said to be accessed \emph{remotely}. \paragraph{Graph structure.} With this extension, the nodes are organized in a graph structure instead of a tree. Calculations over graphs often require some form of repetition, which can be encoded with cyclic attributes. Values of cyclic attributes can be computed if the demand-driven evaluation is extended with fixpoint iteration, and the attributes are given an initial value. The computation terminates if the rules are monotonic and each ascending chain of attribute values stabilizes. \paragraph{Advantages and disadvantages.} Reference attributes provide a convenient way of transporting information from one location in a tree to another location. Also, the extension allows more analyses to be modelled with AGs, such as abstract interpretations, which are typically fixpoint iterations over graphs. This expressive power comes with a price: the well-definedness of an AG cannot be statically verified in general (Section~\ref{sect.intro.ag.depgraphs}). Moreover, if values of \emph{inherited} attributes can be defined remotely, well-formedness of the AG cannot be checked statically, which has as consequence that a straightforward mapping (such as in Section~\ref{sect.intro.ag.eval}) to a purely functional language is not possible\footnote{ We implemented synthesized reference attributes in UUAG for lazily evaluated grammars. }. \subsection{Correspondences between AGs, HOAGs, Monads, and Arrows} \label{intro.sect.correspondences} %if False In a functional language, we may represent AGs as functions from inherited attributes to synthesized attributes in the target language. The attributed AST is implicitly stored on the call stack and in function closures. Alternatively, we obtain more explicit representations by defunctionalizing~\citep{DBLP:journals/lisp/Reynolds98a} the functions. %endif In later chapters, we translate AGs to a monadic target language~\citep{meijerjeuring95} and also consider AGs translated to Arrows~\citep{DBLP:conf/afp/Hughes04}. Being able to structure a computation as a monad or arrow allows reflection on the structure of the computation. We use such introspection in Chapter~\rref{chapt.breadthfirst} to implement a step-wise evaluation strategy. For example, consider the code of production |Cons| of the example in Section~\ref{intro.ag.syntax}. In its present formulation, it can be evaluated in a strict fashion: \begin{code} sem_Cons loc_hd loc_tl lhs_down = (lhs_sh) where tl_down = loc_hd (tl_sh) = loc_tl tl_down lhs_sh = Cons lhs_down tl_sh \end{code} We rewrite this code using arrow notation~\citep{DBLP:conf/icfp/Paterson01} and call the result an execution plan: \begin{code} sem_Cons field_hd loc_tl = proc lhs_down -> do tl_down <- (sub f copy) -< field_hd -- transcription of the first rule tl_sh <- loc_tl -< tl_down -- invoke arrow of child |tl| lhs_sh <- (sub f cons) -< (lhs_down, tl_sh) -- transcription of the second rule returnA -< lhs_sh -- output where (sub f copy) = id (sub f cons) = uncurry Cons \end{code} The translation into arrow notation essentially desugars the above code into point-free style, thus in a linear composition of the rule functions that is interspersed with combinators to rearrange the intermediate attribute values. Can we represent any AG in this way? When we limit the expressiveness of the language of the rules to tree constructions only, conventional AGs can express primitive recursive functions whereas HOAGs can express all computable functions. This difference can be observed when translating to arrows. In this setting, the semantics of a tree is an arrow that takes a tuple of inherited attributes as input and produces a tuple of synthesized attributes as output. A conventional AG is expressible as a plain arrow, but a HOAG requires the generalization to a monad, and AGs that are not statically ordered may require a feedback-loop. Introspection of the arrow is possible using defunctionalization~\citep{DBLP:journals/lisp/Reynolds98a}. Such an arrow actually encodes the tree dependency graph (Section~\ref{sect.intro.ag.depgraphs}), thus introspection on this structure allows a runtime optimization (e.g. elimination of identity functions) of this graph which may be useful when traversing parts of the graph several times (Chapter~\rref{chapt.iterative}). %if False Suppose that we iterate the evaluation of such attributes for a node to compute a fixpoint (Chapter~\rref{chapt.iterative}). We want to dynamically fuse rules to improve the performance of each iteration. For example, the identity function is a unit to composition and can be eliminated. An AG is likely to have many rules with identity functions (Section~\ref{intro.sect.copyrules}). We can accomplish this by allowing introspection of the node's computation. To allow reflection on the computation, we represent the arrow as a first-class value with typed abstract syntax and GADTs. The arrow combinators are expressible in terms of the following constructors |Arr|, |Comp| and |First|: \begin{code} data Sem :: * -> * ^^ where -- child's semantics as arrow Arr :: Rule i s -> Sem i s -- embed rule Comp :: Sem i z -> Sem z s -> Sem i s -- compose two arrows First :: Sem i s -> Sem (i,z) (s,z) -- bypass some intermediate attrs |z| data Rule :: * -> * ^^ where -- reflection on rules Fcopy :: Rule a a -- identity rule |sub f copy| Fswap :: Rule (a,b) (b,a) -- swap rule Fdup :: Rule a (a,a) -- clone rule Fcons :: Rule (Char,String) String -- the |sub f cons| rule \end{code} The following evaluation function defines the expected meaning of the reflected rules: \begin{code} evalRule :: Rule a b -> a -> b evalRule Fcopy = id -- |sub f copy| evalRule Fswap = \ (a,b) -> (b,a) -- needed for combinator |second| evalRule Fdup = \a -> (a,a) -- needed for combinator |&&&| evalRule Fcons = \ (c,cs) -> Cons c cs -- |sub f cons| \end{code} These constructors are essentially an index to the actual code of the rule. We can use knowledge about compositions of rules to replace such compositions with more efficient implementations. As a simple example, we eliminate left-skewed copy chains from such an arrow, and replace nested |First|s with a more efficient variant |First2|: %format First2 \begin{code} optimize :: Sem i s -> Sem i s optimize (Comp (Arr Fcopy) g) = optimize g optimize (Comp f g) = Comp (optimize f) (optimize g) optimize (First (First t)) = First2 (optimize t) optimize (First t) = optimize t optimize a = a \end{code} The interpretation of the arrow combinators is the standard arrow instance for functions: \begin{code} evalTree :: Sem i s -> i -> s evalTree (Arr r) = evalRule r evalTree (Comp f g) = evalTree g . evalTree f evalTree (First t) = \ (a,c) -> (evalTree t a, c) evalTree (First2 t) = \ (a,cd) -> (evalTree t a, cd) \end{code} Note that the arrow approach has overhead related to the management of the intermediate state that connects the rules. More advanced rule fusion can reduce this overhead. For example, we implemented some flow analyses on ActionScript bytecode\footnote{ Source code of the AG-based ActionScript bytecode manipulation suite: \url{https://svn.science.uu.nl/repos/staff.ariem.fittest-uu/tools/asic/}}. For one analysis, we essentially added a delta to a chained integer attribute. The byte code consists typically of long sequences of instructions with only little branching. Such sequences can be compressed by adding up the deltas. %endif \subsection{Specification of Typing Relations} \label{sect.intro.agtypespec} In the preceding sections, we discussed several AG features. We will now make the connection with type systems. A reason why attribute grammars are a convenient formalism to describe type systems is that we can see a specification with type rules as an attribute grammar where the typing relations are attributed nonterminals, the type rules form the productions, and the judgments form the symbols with equations between the attributes. The AG thus describes the structure of derivation trees. To map an AG to type rules, we need to translate nonterminals, productions and rules. A nonterminal |N| with inherited attributes |sub I N| and synthesized attributes |sub O n| translates to a typing relation of the form |(sub I N) (annotTurnstyle N) (sub O n)|. Note that the distinction between inherited and synthesized gets lost in the translation, although we use the notational convention that supposed inputs are to the left of the turn style and supposed outputs are to the right. Without going in details, a production |p = u -> (many w) `cdot` (many r) `cdot` X| translates to a type rule with a judgments for each symbol in |many w|, equality judgments for each rule in |many r|, and a conclusion for |u|. The actual translation is quite similar to the translation to a lazy functional program, which we already showed in Section~\ref{sect.intro.ag.eval}. If we take the inverse of the above translation, we arrive at a translation from type rules to AGs. This process, however, is non-trivial, which is not surprising because there exists no general inference algorithm. Thus, to do so, we need to identify which attributes are inherited and which are synthesized, and with what computable function to represent equality judgments. However, as we come back to in Section~\ref{intro.sect.globalpicture}, we can use such an improper AG as a starting point for an inference algorithm, and add information to make it a proper AG. Such an AG then has the desirable property that its basis directly corresponds to the specification. In the next sections, we give an AG implementation of a DHM type inferencer, and show some attribute grammar features that are useful when using AGs to describe derivation trees. \subsection{Damas-Hindley-Milner Inference} \label{intro.sect.ag.dhm} As an example of a type inference algorithm written with attribute grammars, we give an AG implementation of the DHM algorithm (Section~\ref{sect.intro.dhm}) by using the same approach as presented by \citet{DBLP:conf/afp/DijkstraS04}. \begin{defi}[Chained attribute] A \emph{threaded attribute} or \emph{chained attribute} stands for both an inherited and a synthesized attribute with the same name. \end{defi} The environment is modeled as an inherited attribute, errors as a synthesized attribute, and the substitution as a chained attribute. \begin{figure}[tbp] %format aenv = "\Varid{env}" %format aty = "\Varid{ty}" %format asubst = "\Varid{subst}" %format ainh = "\Varid{inh}" %format asyn = "\Varid{syn}" \begin{code} grammar Expr -- abstract grammar for expressions prod Var term x :: Name -- the identifier |x| prod App nonterm f : Expr -- left-hand side of application nonterm a : Expr -- right-hand side of application prod Lam term x :: Name -- the identifier |x| bound by the lambda nonterm b : Expr -- the body of the lambda prod Let term x : Expr -- the name of the binding nonterm e, b : Expr -- the binding and body attr Expr ^^^ inh aenv :: Env -- inherited environment chn asubst :: Subst -- chained attr: both inh and syn syn aty :: Ty -- synthesized type syn errs :: Errs -- collection of type errors \end{code} \caption{DHM grammar and attributes.} \label{fig.intro.dhmgramattr} \end{figure} For the type we have two options. Either the parent passes an expected type to the child that is further constrained by the child, or the child passes up an inferred type that is further constrained by the parent. When type annotations are part of the language, the former approach detects type errors faster. The information from these type annotations can then be given to a child, instead of verifying the resulting type after the fact. However, for this example it does not matter, thus we take the latter approach, which we also used for the monadic implementation of DHM in Section~\rref{sect.intro.dhm}. Figure~\ref{fig.intro.dhmgramattr} shows the grammar and attributes of the example. The chained attribute |asubst| is shorthand for an inherited and synthesized with both |asubst| as name. To make it clear which of the two attributes we intend, we explicitly prepend |asyn| and |ainh| to the name in the rules. \begin{figure}[tbp] \begin{code} sem Expr prod Var loc.scheme = lookup loc.x lhs.aenv (lhs.aty, asyn.lhs.asubst) = instantiate loc.scheme ainh.lhs.asubst lhs.errs = emptyset prod App a.aenv = lhs.aenv f.aenv = lhs.aenv (loc.res, ainh.a.asubst) = fresh ainh.lhs.asubst ainh.f.asubst = asyn.a.asubst (loc.errs, asyn.lhs.asubst) = unify f.aty (a.aty -> loc.res) asyn.f.asubst lhs.errs = f.errs ++ a.errs ++ loc.errs prod Lam b.aenv = insert loc.x loc.argty lhs.aenv (loc.argty, ainh.b.asubst) = fresh ainh.lhs.asubst lhs.aty = loc.argty -> b.aty asyn.lhs.asubst = asyn.b.asubst lhs.errs = b.errs prod Let e.aenv = lhs.aenv b.aenv = insert loc.x loc.scheme lhs.aenv loc.scheme = generalize lhs.aenv e.aty asyn.e.asubst ainh.e.asubst = ainh.lhs.asubst ainh.b.asubst = asyn.e.asubst asyn.lhs.asubst = asyn.b.asubst lhs.errs = e.errs ++ b.errs \end{code} \caption{DHM with AG rules.} \label{fig.ag.dhmimpl} \end{figure} The rules in Figure~\ref{fig.ag.dhmimpl} describe how the environment is passed top-down through the tree, how the substitution is threaded in-order through the tree, and how errors are collected bottom-up. Since functions such as |instantiate| and |unify| produce an updated substitution, it is the threading of the substitution that determines in what order the effects of these operations are visible in the substitution. The substitution needs to be threaded carefully in order not to loose any constraints on type variables. In contrast to the monadic approach in Section~\ref{sect.intro.dhm}, the rules are compositional. The distribution of the environment, the threading of the substitution, and the collection of error messages can be described separately and relatively independently. On the other hand, the rules are also more verbose because the environment, substitution and error messages are not hidden. Furthermore, rules that employ |generalize| are crosscutting as they deal with types, substitutions and environments. This has a negative effect on the degree of separation in the descriptions of these individual attributes. We address both issues in Section~\ref{intro.sect.copyrules}. \subsection{Copy Rules and Collection Attributes} \label{intro.sect.copyrules} We often pass values in standard top-down, bottom-up, and in-order patterns between attributes of the tree. The rules that encode these patterns are trivial: essentially identity-functions between attributes. For example, to pass an environment topdown to the children of an application we use the following rules: \begin{code} sem Expr prod App ainh.left.env = id ainh.lhs.env -- copy down left ainh.right.env = id ainh.lhs.env -- copy down right \end{code} To thread\footnote{ When we write that we thread an attribute |x| through the tree, then we actually mean that we thread a value through the tree via a sequence of attributes that are all named |x|, and that are chained together by (mostly) copy rules. In a similar way, we talk about passing attributes topdown and bottom up.} a counter |uid| (unique identifier) through the tree, we use the following rules: \begin{code} sem Expr prod App ainh.left.uid = id ainh.lhs.uid -- copy down to left ainh.right.uid = id asyn.left.uid -- from left to right asyn.lhs.uid = id asyn.right.uid -- copy up from right \end{code} Similarly, if a production has one child, we may pass the value of an attribute of that child bottom-up as value for the same attribute of the parent: \begin{code} sem Expr prod Lam ^^^ asyn.lhs.errs = asyn.b.errs -- copy error messages up \end{code} Such \emph{copy rules}~\citep{10.1109/SCAM.2007.13} are so common\footnote{ The AGs of UHC have more than twice as many copy rules than explicitly written rules. Thus the inference of copy rules saves a lot of manual labor. } that we allow these rules to be omitted. To make such an AG well-formed, the following algorithm augments an AG with copy rules. If a rule is missing for an inherited attribute |a| of a child |c|, we insert a rule that takes an attribute with the same name |a| from the local attributes, or synthesized attributes of the children to the left |c|, or the inherited attributes of the parent. The last attribute occurrence is taken in the ordering: inh from parent, syn of children, local attributes, inherited attrs of parent. In a similar way, we treat omitted copy rules for synthesized attributes. These copy rules can be considered to provide generic behavior for AGs that is not unlike the abstraction offered by reader and state monads. Furthermore, we often collect attribute values in a bottom-up fashion: \begin{code} sem Expr prod App asyn.lhs.errs = asyn.left.errs ++ asyn.right.errs sem Expr prod Const asyn.lhs.errs = [] \end{code} Such \emph{collection rules}~\citep{10.1109/SCAM.2007.13} can also be inferred automatically when we specify a combination operator and an initial value: \begin{code} attr Expr ^^^ syn errs use (++) [] \end{code} This approach captures the abstraction provided by writer monads. \paragraph{Intermediate nodes and copy rules.} It is often convenient to have to have intermediate nodes in the AST structure. An important benefit of copy rules is that it allows us to transparently add intermediate nodes to the AST. As an example, consider function application in the lambda calculus. It is usually expressed as a binary expression in the abstract syntax: \begin{code} grammar Expr ^^^ prod App ^^^ nonterm f,a : Expr \end{code} A function call with multiple arguments (e.g. |f a1 a2|) is thus encoded as a sequence of applications (|(f a1) a2|) using |App|. Suppose that we want to add an inherited attribute that describes at which argument position an expression occurs: \begin{code} attr Expr ^^^ inh index :: Int sem Expr ^^^ prod App ^^^ ainh.f.index = ainh.lhs.index ^^^ ainh.a.index = 1 + ainh.lhs.index \end{code} This definition is incorrect for nested function calls (eg. |f a1 (g a2)|), because the first argument |a2| of the nested call receives |2| as value for its |index| attribute. We get the expected behavior by distinguishing whether an expression occurs to the left or right of an application with an inherited attribute. This, however, requires us to specify this attribute for each occurrence of an expression nonterminal. We obtain a more concise solution if we assume there is always a special top node above a sequence of applications: \begin{code} grammar Expr ^^^ prod AppTop ^^^ nonterm e : Expr sem Expr ^^^ prod AppTop ^^^ ainh.e.index = 0 \end{code} The copy rules transparently connect the remaining attributes of |e| with attributes of |lhs|. For these attributes, the existence of the intermediate node is not visible. A typical place to add these intermediate nodes is in the parser or with a tree transformation. If we furthermore ensure that a special root node occurs above expression trees, then we also easily define an initial value for the |index| attribute: \begin{code} grammar ExprTop ^^^ prod Top ^^^ nonterm e : Expr sem ExprTop ^^^ prod Top ^^^ ainh.e.index = 0 \end{code} With nonterminal sets (Section~\ref{intro.sect.incremental}) we can define attributes that are common to |Expr| and |ExprTop| without code duplication. \paragraph{Higher-order children and copy rules.} \label{sect.intro.hochildrencopyrules} To improve the effectiveness of copy rules further, and in general to improve the separation of concerns, we can encode crosscutting rules as higher-order children (Section~\ref{intro.sect.higherorder}). By encoding a rule |r| as a child, we abstract |r| from how it is combined with other rules. This idea is the spirit of this thesis. A higher-order child can be used to declaratively specify tasks (for instance, to ensure that two terms are equal), and the underlying implementation (the unification function) reflects the effects of performing the task in terms of attribute values. The orchestration of these tasks is determined by how the rules weave the attributes together. \begin{figure}[tb] %format aty1 %format aty2 \begin{code} grammar Unify prod Unify -- nonterminal that represents unification attr Unify ^^^ inh aty1,aty2 :: Ty -- the two input types inh aenv :: Env -- the input environment chn asubst :: Subst -- the input/output substitution syn errs :: Errs -- the output error messages sem Unify prod Unify -- essentially a wrapper around |unify| (lhs.errs, asyn.lhs.asubst) = unify lhs.aenv lhs.aty1 lhs.aty2 ainh.lhs.asubst \end{code} \caption{Encoding of |unify| as a higher-order child.} \label{fig.ag.higherorderunify} \end{figure} In general, we can represent any function as a higher-order child. In Figure~\ref{fig.ag.higherorderunify} we introduce a nonterminal for unification that has inherited attributes for each input of unification and synthesized attributes for each output of unification. The nonterminal has only one production, which contains only one rule, which is the rule we abstract over. \begin{figure}[tb] \begin{code} sem Expr prod App child u : Unify = sem_Unify -- higher-order child u.aty1 = f.aty -- input type u.aty2 = (a.aty -> loc.res) -- input type ainh.u.asubst = asyn.a.asubst -- copy rule asyn.lhs.asubst = asyn.u.asubst -- copy rule u.aenv = lhs.aenv -- copy rule lhs.errs = f.errs ++ a.errs ++ u.errs -- copy rule \end{code} \caption{The DHM rules for |App| with copy rules.} \label{fig.ag.dhmcopyrules} \end{figure} In a similar way we can encode the |fresh| and |instantiate| functions as higher-order children (Section~\tref{visit.sect.typingexpressions}). In Figure~\ref{fig.ag.dhmcopyrules} we show how this is done for production |App|, where we added the unify-nonterminal as higher-order child |u| to the production, and added rules for the attributes of |u|. The latter rules are all implied by the copy rule mechanism and can be omitted. \paragraph{Remarks.} The automatic completion of an AG with copy rules is a double-edged sword. The mechanism saves a lot of boilerplate code, but the automatic behavior may not always be intended. If we accidentally forget to define an attribute explicitly, and that attribute can be given a default definition via a copy rule, then we are not warned that a rule is omitted. We provide a way to specify copy rules per production in Chapter~\rref{chapt.iterative}, which gives more control over where and what copy rules are applied. The copy rule mechanism uses the order of appearance of children for threading and bottom-up collections. This is sometimes not the appropriate order for a given application. For example, higher-order children follow conventional children in the order of appearance, thus are always at the end of copy rule chains. In such situations we can override the copy rule behavior by giving explicit rules for attributes. However, this reduces the convenience of copy rules, and the underlying idea that we rather specify patterns than individual rules. As a solution, in Chapter~\rref{chapt.side-effect} we allow the visit order to children to be explicitly specified for eagerly evaluated AGs, and can then use copy rules that use the visit-order instead of the order of appearance. Moreover, in Chapter~\rref{chapt.warren} we define commutable (copy) rules, which are rules that can be ordered independently of their value dependencies. \subsection{Advantages and Disadvantages} The greatest advantage offered by AGs is that specifications are composable. Productions, attributes and rules can all be specified separately and automatically combined into a monolithic specification (Section~\ref{intro.sect.incremental}). This easily allows new attributes and behavior to be added and shared with already existing rules. In particular, extra attributes can be used to specify additional administration for type inference strategies, and for the specification of what a compiler does with the inferred types. In general, AGs offer modularity~\citep{143210} and extensibility~\citep{DBLP:conf/icfp/VieraSS09}, which can be realized via generic programming and meta programming approaches. In this thesis, we make use of such facilities, although these facilities themselves are not the focus of this thesis. Instead, we focus on combining AGs with algorithms that implement the functionality specified by declarative aspects of type rules. \section{Background on Ruler} \label{intro.sect.ruler} This thesis complements previous work by \citet{DBLP:conf/flops/DijkstraS06} on the Ruler language and tool suite. Ruler gives a semantics to type rule descriptions in the form of an implementation with conventional attribute grammars and Haskell. Consequently, Ruler in its current state provides only an implementation for syntax-directed type rules. The purpose of this thesis is to provide a core language \emph{RulerCore} for Ruler which allows more complex inference strategies to be described. RulerCore extends on attribute grammars in various ways. Therefore, we focus more on attribute grammars than on type rule descriptions, although type systems play a prominent role in our work. The work on Ruler provides a notation for type rules and show how this notation translates to attribute grammars. From this work a notation that maps to RulerCore can be designed, hence we describe Ruler in this section. \subsection{Ruler Features} Ruler aims to generate both a type system specification and a type inference implementation from a single description of the type system with type rules. The generated specification consists of type rules formatted to \LaTeX{} figures, and can be used as documentation and for formal reasoning. The generated implementation consists of attribute grammars that can be included verbatim in the source code of a compiler. This approach guarantees consistency between the specification and the implementation. Ruler provides notation for the incremental description of type rules. It features the addition of parameters to judgements, type rules to relations, and judgements to type rules in a similar way as we can add attributes, productions, and rules to an AG description. When type rules are declarative, only well-formedness checks and generation of the specification is possible. However, Ruler provides also notation to describe algorithmic rules. Using the facilities for incremental descriptions, a direction can be given to the parameters of typing relations, and type rules can be associated with productions of an accompanying attribute grammar, which turns the typing relation in a deterministic function for which AG code can be generated. \subsection{Ruler Concepts} We take the explicitly typed lambda calculus as described in Section~\ref{intro.sect.typerules} as basis to show the main concepts and syntax\footnote{ We took the freedom to deviate slightly from Ruler's actual syntax in order to have closer correspondence with the notation that we use in this thesis. } of Ruler. In Ruler, a type system description is a composition of views on relations and their rules. \begin{defi}[View] In Ruler, a \emph{view} is a named subset of the declared relations, holes, judgments, rules, etc. Each view describes a type system. \end{defi} We start with a declarative view, which we give the name |D|. Later, we provide also an algorithmic view with the name |A|. \begin{figure}[tb] \begin{code} relation expr view D ^^ -- declares the typing relation |expr| for |D| holes g :: Gam ^^^ e :: Expr ^^^ t :: Ty ^^ -- specifies parameters of |expr| judgespec g :- e : t -- notation for judgments of |expr| relation member view D ^^ -- declares the relation |member| holes g :: Gam ^^^ x :: String ^^^ t :: Ty ^^ -- specifies parameters of |member| judgespec (x,t) `elem` g ^^ -- notation for judgments of |member| \end{code} \caption{Examples of declaring relations in Ruler.} \label{fig.ruler.defrelations} \end{figure} \begin{defi}[Hole] In Ruler, the parameters of a relation are called \emph{holes}. The parameters are explicitly named and are explicitly typed. \end{defi} Figure~\ref{fig.ruler.defrelations} shows how to declare a 3-place relation |expr|. The line with \emph{judgespec} specifies a custom notation (meta-grammar) for the |expr| relation in terms of a meta grammar production. Each hole must be uniquely present as meta nonterminal in this meta production. \begin{figure}[tb] \begin{code} ruleset theRules relation expr -- a set of rules for |expr| rule e.var view D -- rule for the var-case judge L : member ^^^ (x,t) `elem` g -- premise with the name |L| ~~~ judge R : expr ^^^ g :- x : t -- conclusion with the name |R| rule e.app view D -- rule for the app-case judge F : expr ^^^ g :- f : t1 -> t2 -- premise with the name |F| judge A : expr ^^^ g :- a : t1 -- premise with the name |A| ~~~ judge R : expr -- alternative syntax for judgment | g = g -- binds expression |g| to hole |g| | e = f a -- binds |f a| to hole |e| | t = t2 -- binds |t2| to hole |t| \end{code} \caption{Example of Ruler rules.} \label{fig.ruler.examplerules} \end{figure} \begin{defi}[Ruleset] A \emph{ruleset} is a group of rules in combination with a collection of meta-information, such as a name for the group. \end{defi} A relation is either a foreign relation, or it is specified by the rules of some ruleset. Figure~\ref{fig.ruler.examplerules} shows the ruleset |theRules|. A rule is given an explicit name, and zero or more judgments as premises above the line, and one premise under the line. A judgment has an explicit name and corresponds to a relation. The arguments are either bound to the corresponding holes via the custom syntax in a nameless way, or via a generic syntax where each binding is explicitly given. The language of Ruler expressions consists of meta variables (such as |f| and |a|), and external constants and operators (such as the function arrow). The interpretation of such terms depends on the target language. Denotations can be given for individual symbols and constants, as well as for (saturated) applications: \begin{code} rewrite ag ^^ ((t1 :: Ty) -> (t2 :: Ty)) = (TyArr t1 t2) :: Ty -- denotation of application external TyArr -- identity denotation rewrite tex ^^ ((t1 :: Ty) -> (t2 :: Ty)) = (t1 -> t2) :: Ty -- fully saturated appl only format tex ^^ -> ^^ = "\rightarrow" -- denotation of a symbol \end{code} Rewrite rules are applied in a bottom-up fashion. The LHS of a rewrite rule specifies a typed pattern to match against a Ruler expression. The RHS must again be a Ruler expression, which then is assumed to have the given explicit type. These types allow the notation to be overloaded. A rewrite rule applies if both the syntax and the types of the LHS matches the actual Ruler expression. An explicit denotation must be given for foreign relations. The distinction between conventional relations and foreign relations is that the latter is not defined by rules. An explicit denotation to the host language needs to be given for a foreign relation: \begin{code} relation member view D -- |member| is not specified by rules judgeuse tex ^^ (x,t) `elem` g -- denotation for \LaTeX{} judgeuse ag ^^ (Just t) = lookup x g -- denotation for AGs with Haskell format tex ^^ `elem` ^^^ = "\in" -- denotation of a symbol external Just lookup -- identity denotation \end{code} The above declarative specification can be typeset to \LaTeX{}. To describe an algorithmic version, additional information needs to be added to the relations and rules. We can describe these additions separately by defining a view |A| that extends from |D|: %format viewhierarchy = "\mathbf{viewhierarchy}" \begin{code} viewhierarchy D < A -- partial order on views \end{code} In particular, we need to define how the relations are mapped onto nonterminals of an AG, and turn the relations into deterministic functions such that the parameters can be mapped to attributes. Ruler code does not stand on its own. The generated code for type inferencing is supposed to be used in conjunction with other AG code. An association is specified between relations (schemes) in Ruler and nonterminals in the AG. For example, the relation |expr| is associated with the nonterminal |Expr|. Its type rules are associated with productions of |Expr|. This correspondence is made explicit by annotating the nonterminal declaration with the name of the relation, and the productions with the name of the corresponding rule\footnote{ Ruler allows productions to be defined by combining rules. Within the square brackets may not only be a name of a rule, but also an expression that denotes a composition of rules. An example is the left-biased union of two rules. }: \begin{code} grammar Expr [expr] view A -- relation |expr| mapped to nonterminal |Expr| prod Var [e.var] -- rule |e.var| mapped to production |Var| term nm :: String prod App [e.app] -- rule |e.app| mapped to production |App| nonterm f : Expr nonterm a : Expr \end{code} Moreover, the holes are mapped attributes. Judgments represent the semantics of a production. Judgments of a foreign relation are translated to AG rules. The other judgements correspond to children of the production and are mapped to rules that define the inherited attributes of the children, or pattern match against the synthesized attributes. In view |A|, we refine the declaration of |expr| to include additional information. In this example, we only provide additional information about the attributes. In general also additional attributes and syntax can be added. We map the holes of the |expr| relation to inherited or synthesized attributes. The hole that corresponds to the AST gets the special \emph{node} designation: \begin{code} relation expr view A -- algorithmic view on |expr| holes node e :: Expr -- AST node inh g :: Gam -- input param is inherited attribute syn t :: Ty -- output param is synthesized attribute \end{code} A Ruler expression is at a defining position if it is bound to an inherited hole of a premise, or bound to a synthesized hole of a conclusion judgment. Otherwise, it is at a using position. The generated algorithm constructs a value at defining positions, and matches against values at usage positions. \begin{figure}[tb] \begin{code} ruleset theRules relation expr -- extensions to rules for |expr| rule e.var view A -- rule for the var-case ~~~ judge R : expr ^^^ g :- (node nm = x) : t -- association between term |nm| and |x| rule e.app view A -- rule for the app-case ~~~ judge R : expr -- convenient for extensions | e = (node f = f) (node a = a) -- assoc child |f| to |f| and child |a| to |a| \end{code} \caption{Rules demonstrating node-holes.} \label{fig.ruler.nodehole} \end{figure} A node-hole takes the AST as value. In judgments of rules, node holes are treated differently with respect to normal holes in order to define the correspondence between judgments of the type rule and children of the production. The node hole of a premise judgment may only be an identifier and corresponds uniquely to child of the associated production. In the expression bound to the node hole of the conclusion, these identifiers must occur and we annotate them with the name of the child of the production. Figure~\ref{fig.ruler.nodehole} shows an example. The syntax for explicitly named bindings of holes is convenient for extensions, as only the bindings that are redefined need to be mentioned. When a Ruler description is well-formed, an AG can be generated from the above description. This AG contains attributes and semantics for type inference, and can be combined with other attributes and Haskell infrastructure into a compiler. \subsection{Damas-Hindley-Milner Inference} Similar to Section~\ref{sect.intro.dhm} and Section~\ref{intro.sect.ag.dhm} we show in this section a DHM inference algorithm in Ruler. In the DHM view, we add lambda abstractions without an explicit type annotation and a let expression. However, we only show the code for the app-rule, since we already explained Ruler's concepts in the previous section. \begin{figure}[tb] \begin{code} viewhierarchy D < A < H -- DHM view |H| relation expr view H -- DHM view on |expr| holes chn s :: Subst -- substitution as threaded attribute relation tyFresh view H -- wrapper for |fresh| holes chn s :: Subst -- threaded subst syn t :: Ty -- fresh type judgespec ^^ t `fresh` -- syntax for the judgement judgeuse ag ^^ (t, asyn.s) = fresh ainh.s -- denotation for AGs with Haskell relation tyUnify view H -- wrapper for |unify| holes chn s :: Subst -- threaded subst inh t1 :: Ty1 -- left type inh t2 :: Ty2 -- right type judgespec ^^ t1 == t2 -- syntax for the judgment judgeuse ag ^^ (retain errs, asyn.s) = unify g t1 t2 ainh.s \end{code} \caption{DHM relations in Ruler.} \label{fig.dhm.ruler.relations} \end{figure} In Figure~\ref{fig.dhm.ruler.relations} we add a DHM view on top of the algorithmic view, and define external relations to obtain fresh types and to unify two types. The retain-keyword declares that the left component of the output tuple of |unify| is mapped to a local AG attribute |loc.errs|, which can then be collected by conventional AG rules. This mechanism allows values to be exposed as attributes to the encapsulating AG. \begin{figure}[tb] \begin{code} ruleset theRules relation expr rule e.app view H judge T : tyFresh ^^ r `fresh` | ainh.s = s1 ^^ | asyn.s = s2 judge F : expr | t = t1 ^^ | ainh.s = s2 ^^ | asyn.s = s3 judge A : expr | t = t2 ^^ | ainh.s = s3 ^^ | asyn.s = s4 judge U : tyUnify ^^ t1 (t2 -> r) ^^ | ainh.s = s4 ^^ | asyn.s = s5 ~~~ judge R : expr ^^ | ainh.s = s1 ^^ | asyn.s = s5 \end{code} \caption{Ruler rules for |e.app|} \label{fig.ruler.dhm.rules} \end{figure} Figure~\ref{fig.ruler.dhm.rules} shows the rules for |e.app| in view |H|. The hole-bindings for judgments essentially specify the threading of the substitution. The notation for hole-binding can be used to supply bindings for holes that are not present in the judgement's special syntax. To connect two nodes, we introduce an intermediate meta variables |s1|, |s2|, etc. This threading has to be done manually as Ruler does not have a concept of copy rules. \subsection{Discussion} \label{intro.sect.ruler.discussion} Ruler provides notation and composition mechanisms for the description of type rules. Rules may inherit from other rules, and rules inherited from the same rule from preceding views. These mechanisms enhance modularity and reuse. Effectively, a ruler fragment is a partial specification. The meaning of a ruler specification is only defined for a complete composition in combination with the associated attribute grammar. It would aid formal reasoning if a meaning can be attached to individual fragments. The Ruler compiler generates only an inference algorithm for algorithmic specifications. As a consequence, the code generation is limited to syntax-directed type rules. Syntax-directedness does not hold for many declarative type systems that have relations with overlapping rules, or rules that dispatch on more than one argument of the conclusion judgment. Also, the premisses must be functional. The transformation of a relation to a function by itself is non-trivial, and common techniques such as fixpoint iteration or search strategies are not directly supported by Ruler. We address these complications in this thesis (Section~\ref{intro.sect.overview}). Moreover, Ruler's features are actually not specific to the domain of type systems. Ruler provides a rudimentary composition mechanism, syntactic sugar, and type setting support for a formalism that is not unlike AGs. These features would equally well benefit AGs in general\footnote{ An example of a feature that benefits AGs is Ruler's automatic unique numbering mechanism. We generalized and implemented a similar mechanism for AGs. }. \section{Thesis Overview} \label{intro.sect.overview} Our ultimate goal is to semi-automatically generate a type inference algorithm from a declarative type system specification. In particular, we focus on type systems described as a collection of type rules, and an implementation based on attribute grammars. In this thesis, we present RulerCore, a language that extends attribute grammars. It facilitates the composable description of inference algorithms that are typically used to implement declarative aspects of type rules. RulerCore can thus be used to give an executable semantics to a set of type rules. In this section, we give an overview of RulerCore's extensions on attribute grammars. In Section~\rref{intro.sect.globalpicture} we position this thesis with respect to the larger goal. \subsection{Inference Algorithms as an Attribute Grammar} We give an abstract description of how we structure inference algorithms as an AG. Chapter~\rref{chapt.iterative} shows a concrete example. We give AGs over typing derivations, instead of AGs over the abstract syntax of a language. In the following chapters of this thesis, we refer with \emph{abstract syntax tree} either to typing derivations or to the result of parsing\footnote{ The typing derivation is typically an extension of the AST, thus the difference is usually irrelevant and can be determined from the context. Similarly, we refer to AST and the AST decorated with attributes interchangeably. }. Nonterminals thus correspond loosely to typing relations, and productions to type rules (Section~\ref{sect.intro.agtypespec}). The grammar for typing derivations may have more structure than is present in the type rules. For example, we may add nodes that each represents a choice between alternatives, so that we effectively describe a forest of typing derivations. We map each declarative aspect of a type rule to a higher-order child (Section~\rref{intro.sect.higherorder}) as shown by Section~\rref{intro.sect.copyrules}. Extra attributes, such as a substitution, provide contextual information for these children. For example, an equality premise between two types in the type rules corresponds to a unification-child in the AG. The structure of the unification child servers as proof that the two types can be made equal, and the resulting substitution attribute reflects the effect of unifying the two types. We treat meta variables in type rules as conventional attribute values (Section~\ref{intro.sect.ag.dhm}). Type inference amounts to determining the structure of these children. To choose a particular tree for a child may require an exploration of candidate trees. Instead of constructing a single derivation tree, we actually construct and choose from a forest of derivation trees. Through value dependencies between attributes, we effectively define in which order the structure of these children is determined, and in which order the effects of determining this structure is visible in the substitution attribute. This approach allows us to encode Algorithm W (Section~\ref{sect.intro.dhm}). For more complex inference algorithms, such as constrained-based algorithms and algorithms that require fixpoint iteration, the shape of the derivation tree and the values of attributes are mutually dependent. Inference algorithms therefore analyze, explore, and extend (intermediate) partial derivation trees. This process does not have a straightforward mapping to an AG, since an AG specifies when the resulting derivation tree is correct, but not how the intermediate trees are obtained. We introduce extensions to AGs to make such intermediate trees visible in the AG description. \subsection{Attribute Grammar Extensions} Chapter~\rref{chapt.rulercore} gives a detailed outline of each extension. The following chapters work out each extension individually. \paragraph{Visits.} RulerCore is a language for the description of higher-order, ordered attribute grammars. We extend this basis with explicit specifications of visits. In comparison, visits are implicit in ordered attribute grammars. Chapter~\rref{chapt.side-effect} introduces the language and the notation. For ordered attribute grammars, there exists an evaluation algorithm that starts with an initially undecorated tree and ends in a correctly decorated tree. The \emph{state} of a tree is the collection of decorations present in the tree. During the evaluation, the state of the tree thus changes. A \emph{configuration} is a set of attribute names. A configuration describes the state of a tree when the set of decorations of the root contain exactly the attributes as mentioned in the configuration. In an ordered attribute grammar a linear order exists between configurations. A \emph{visit}, a unit of evaluation for a node, transitions the state of a tree to a state described by the next configuration. We treat a node with a state described by a certain configuration as a \emph{first class value}, which we can store in attributes, obtain from attributes, inspect, and programmatically apply state transitions on. This approach offers us sufficient control over the AG evaluation to combine AGs with \emph{monadic operations}, such as the unification monad as shown in this chapter. We provide notation to declare a totally ordered sequence of visits per nonterminal. Each attribute declaration must be associated with one visit declaration, which has consequences for the scheduling of rules. During attribute evaluation, attributes that are associated with an earlier visits are defined before attributes of a later visit are computed/defined. In this context \emph{defined} means that a reference to the attribute value is available. Similarly, rules of productions of a nonterminal may be associated with a visit of that nonterminal, which restricts the scheduling of such rules to either that visit or to a later visit. Furthermore, we may restrict a rule to a particular visit, which ensures that the rule is scheduled before any rules of subsequent visits. With this approach, rules may make assumptions about the configuration of tree tree prior to the rule's evaluation. Moreover, the notation allows us to define customizations of evaluation strategies for rules of a particular visit or for visits to particular subtrees. A disadvantage of our approach with respect to attribute grammars is that we need to specify a visit for each attribute. This requires additional effort and makes attribute declarations less composable. We typically declare an attribute for a set of nonterminals instead of a single nonterminal. In our approach, this is only possible if all the nonterminals in the set have a common visit to which the attribute can be scheduled. We show in Chapter~\rref{chapt.warren} how to solve this issue. \paragraph{Fixpoint iteration, clauses and constraints.} In Chapter~\rref{chapt.iterative} we exploit the notion of visits. We show how we to conditionally repeat the evaluation of a visit to a child, which allows the encoding of \emph{fixpoint iteration}. Using visit-local attributes, a state can be kept between iterations. Moreover, we allow one or more clauses to be defined for a visit. Each clause provides an alternative set of rules for the visit. With special match-rules we specify constraints on clauses. With clauses in combination with higher-order children we can define the structure of the derivation tree in terms of attributes, and thus deal with type-directed inference algorithms. We treat intermediate derivation trees as first class values. With the specification of visits, we can reason about the configuration a tree is in. A tree that is in a certain configuration can be detached, transfered via attributes to another location, and attached there. With this mechanism we can represent \emph{constraints} or \emph{deferred judgments} as trees with access to their context via attributes. \paragraph{Exploration of alternatives.} The above extensions describe algorithms that conservatively approximate the derivation tree. For some type systems it is necessary to explore a forest of candidate derivation trees. Such a forest can be represented with a decision tree, which contains choice nodes that branch to various \emph{alternatives}. In Chapter~\rref{chapt.breadthfirst} we show how to describe explorations of such alternatives with AGs. We present a technique that allows a spectrum of depth-first and breadth-first search strategies to be described. In a statically scheduled AG, we can evaluate the AG in a step-by-step fashion. By intertwining the evaluation of alternatives, we obtain a breadth-first search. After each step, some intermediate values may be available, which can be used to direct the search process. \paragraph{Phases and Commuting Rules.} In Chapter~\rref{chapt.warren} we generalize visits to \emph{phases}. A phase may consist of one or more implicitly defined visits, which are determined by the static scheduling of the AG. As a consequence, an attribute does not need to be explicitly assigned to a visit, and its scheduling may optionally be constrained by a phase. Using this approach, our extensions extend AGs conservatively. For some chained attributes, the order induced by value dependencies of their rules may be too strict when these rules encode commuting operations. We present \emph{commuting rules}, which are rules that are connected via a chained attribute, but which do not depend on previous rules in the chain, Such rules thus give us more freedom in the scheduling of these rules. Typical examples are the threading of a unique number supply, and the threading of substitutions. To preserve referential transparency, the commuting rules must satisfy a liberal commutativity law. With such rules we can functionally encode the behavior of a rule with side effect that is scheduled to different implicit visits. \paragraph{Dependent AGs.} In Chapter~\rref{chapt.dependent-ags} we apply dependent types to AGs. In a dependently typed AG, the type of an attribute may refer to values of attributes. The type of an attribute is an invariant, the value of an attribute a proof for that invariant. Thus, with dependent AGs we can proof properties of our compiler. Additionally, this chapter serves as a showcase for visits and clauses. \subsection{Contextual Chapters} In the extended edition of this thesis\footnote{ Extended edition: \url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/thesis-extended.pdf} }, we place the above extensions in a wider context. \paragraph{Graph Traversals.} Many computations in a compiler take control-flow or data-flow graphs into account. We show that the mechanism to attach and detach children can be used to interface AGs with graph traversals~\citep{ariem-ext-graph}. \paragraph{GADTs.} In the extended edition~\citep{ariem-ext-gadt}, we show an example of a type system for Generalized Algebraic Data Types (GADTs). An inference algorithm for this system requires an exploration of alternatives (Chapter~\tref{chapt.breadthfirst}). We formulate our specification so that it is orthogonal to specifications of ADTs: \begin{itemize} \item We present our specification as System F augmented with first-class equality proofs. \item We exploit the Church encoding of data types to describe GADT matches in terms of conventional lambda abstractions. \end{itemize} Such orthogonal designs are important in order to compose type systems, and ultimately thus also to compose type inference algorithms. \section{The Context of this Thesis} \label{intro.sect.globalpicture} Types play an increasingly more important role in the design of programming languages. Type systems specify a relation between programs and types, which facilitates (formal) reasoning with typed programs. Moreover, type systems form a partial specification for type checking and type inference algorithms. As we discussed in Section~\ref{intro.sect.motivation}, our ultimate goal is to semi-automatically derive type inference algorithms from declarative type system specifications. We mentioned in Section~\ref{intro.sect.typesystems} that a set of type rules alone is not a complete description, hence we develop Ruler (Section~\ref{intro.sect.ruler}), which is a domain-specific programming language in which we write an inference algorithm as an extension of the declarative type rules. \subsection{Challenges} There are several challenges that need to be overcome to reach this goal. We identify two main challenges. This thesis is situated in the second challenge. The first challenge is related to type system compositions. Language features and their declarative type systems are typically defined as extensions of a bare lambda calculus. Some language features are mutually conflicting (e.g. invalidate type soundness). However, many language features compose in standard ways. For example, features described for a lambda calculus that support fix-expressions can be translated to a description for a language with recursive let bindings. To meet this challenge, we wish to describe language features in isolation, and describe a composition of these features for the actual source language. As illustration, the type rule formalism lacks the expressiveness that higher-order functions offer to functional programs, such as the ability to abstract over common patterns, and to instantiate these abstractions with minimal syntax. For small type system descriptions that appear in type system theory, such expressiveness is not needed. However, type system descriptions of actual languages are large and hard to maintain. The second challenge is related to declarative type rules. Declarative type rules abstract from evaluation strategies. However, a general inference algorithm does not exist, and a naive algorithm such as mentioned in Section~\ref{intro.sect.inference} is either incomplete or inefficient. Ultimately, inference boils down to resolving \emph{declarative aspects}: to determine the structure of the derivation tree, and computing with values that may not be fully determined yet. In practice, inference algorithms are intricate compositions of common algorithms that treat such declarative aspects in a predictable and deterministic way. These algorithms are hard to combine. If one declarative aspect requires a constraint-based algorithm for its resolution, and another requires some form of search, then the order in which the aspects are resolved is likely to be relevant. Also, it is hard to describe the flow of information between different solving techniques. To allow these techniques to mutually cooperate, we need a language for the description of a composition of such techniques. Hence, this thesis. \subsection{Additional Challenges} Aside from the general motivation of our research, another source of motivation is that we intend the results of our research to benefit the implementation of our Haskell compiler UHC~\citep{uhc}. Therefore, we impose additional demands on solutions to the above challenges. Firstly, since UHC's implementation is based on Haskell itself, we require that solutions integrate seamlessly with Haskell. This restriction effectively rules out the direct use of (functional) logic languages, due to differences in the evaluation model and the representation of data structures\footnote{ \citet{Hanus10transformingfunctional} show that an embedding of functional logic programs is possible in Haskell, but affects all data representations and forces all computations to monadic style. However, we use techniques techniques and ideas from logic programming that integrate seamlessly, such as backtracking in a monad~\citep{Hinze:2000:DBM:351240.351258,DBLP:conf/icfp/KiselyovSFS05}. }. In addition, we desire that our research can also be exploited in compiler suites that are implemented with languages without lazy evaluation or strict typing disciplines. Secondly, along similar lines, we refrain from the use of dependently-typed languages, since an extraction to Haskell is a one-way process that also affects data-type representations. Our goal is to be able to generate an implementation. Advances in dependently-typed languages seem promising, but a formally certified implementation of a compiler such as UHC is currently infeasible. Finally, the resulting implementation should be reasonably efficient in order to process ASTs of large programs. In our experiences with UHC, we noticed that memory usage is an issue when using demand-driven evaluation of AGs. We experimentally verified that the time spend on traversing abstract syntax trees in UHC is negligible in comparison to the computations that are performed on each node of the AST. Thus, while traversal overhead is rarely a problem, memory usage is an item of concern, which we address by using statically ordered evaluation of AGs. \subsection{Solutions} \label{sect.intro.solutions} A partial solution to the first challenge is given by \citet{dijkstra05phd}, as demonstrated by the UHC project, and the initial development of Ruler (Section~\ref{intro.sect.ruler}) in particular. Ruler's composition mechanisms and syntax extensions that are provided by Ruler would be beneficial to AGs. For example, many dense translation schemes in this thesis are manually derived from actual AG descriptions. These AG descriptions focus on attributes in isolation and are easier to understand, but too verbose for inclusion in this thesis. Solutions for AGs would also work for type rule descriptions, and vice versa. %% For example, many translation schemes in this thesis are written with Indeed, composition facilities for AGs receive ongoing attention~\citep{DBLP:conf/icfp/VieraSS09,DBLP:conf/gpce/Saraiva02}. %% Claim that this first challenge is not something we want to work on. There is still a long way to go with respect to the first challenge. Since declarative rules abstract from an evaluation algorithm, the data structures and administration that are involved in the algorithm are chosen for convenience and notational conciseness. In an actual implementation, we may be able to represent certain administration in a more efficient way using specialized data structures. However, we do not address these issues in this thesis, and only mention some in passing in \citet{ariem-ext-gadt}. We assume that the declarative rules are specially crafted to make them more suitable for an actual implementation. %% I want to express here the reasons why we choose for AGs, and not %% for other techniques such as constraints, monadic traversals, etc. There is thus some open work for the first challenge. However we focus on the second challenge. This challenge is more pressing, because we need basic building blocks before we can compose them in clever ways. We propose to tackle the second challenge with attribute grammars. Inference algorithms that are specified by declarative type rules are sensitive to context---non-inductive properties of the AST---and attribute grammars excel in providing such contextual information with attributes, as is proven by UHC's implementation. Also, from a practical perspective, since UHC's implementation is based on AGs, an inference algorithm based on AGs interfaces conveniently with attributes of other components in the compiler, as shown by previous work on Ruler. However, attribute grammars in current form at not well suited for the description for inference algorithms of complex type systems. Inference algorithms therefore make explicit assumptions about the intermediate states of the derivation trees during its construction. In a conventional AG, we cannot do so, because AG descriptions are defined in terms of the final state of the derivation tree. In order to make assumptions about the intermediate state, we extend AG evaluation and hence arrive at Section~\rref{intro.sect.overview}. Our approach applies to the description of algorithms that are recursive functions over tree-like data structures. In particular, our approach applies to catamorphisms, which is not surprising because attribute grammars can be considered a domain specific language for the description of catamorphisms. On the other hand, for example, algorithms based on graph rewriting are not straightforwardly expressed in our approach. An inherent difference is that we traverse a structure whereas rewrite rules as used by graph rewriting access the structure in irregular ways. Also, algorithms that involve matrix operations to efficiently solve linear constraints cannot be described straightforwardly. However, \citet{ariem-ext-graph} shows how to mix attribute evaluation with external solvers. \section{Related Work} \label{sect.intro.relatedwork} %include ../related-work/related-work.lhs %if False \section{Common Notation} We often refer to keywords which are typeset in bold in code blocks. To prevent typesetting them in bold in the running text, we use the notational convention that some keyword |visit| is called the visit-keyword. We trade type setting for additional hyphens. %endif \section{Conclusion} \label{sect.intro.conclusion} In the following chapters, we present several extensions to attribute grammars that facilitate the description of complex type inference algorithms. The central concept in these chapters is that we exploit the explicit notion of visits to control and manipulate chunks of AG evaluation. It allows us to transform the tree during attribute evaluation---precisely what we need to express type inference algorithms. Many classic AG approaches use a notion of visits in their intermediate languages (Section~\ref{sect.intro.ag.eval}). In this thesis, we instead propose to use visits as \emph{programming model}. With this programming model, we express resolution strategies for declarative aspects of type rules. Our extensions offer flexible ways to \emph{tune} conventional attribute grammar evaluation, and are conservative extensions of (ordered) attribute grammars. We offer a delicate balance between on the one hand the implicit evaluation strategy of attribute grammars, and on the other hand the need to make this explicit for more complex evaluation strategies. Underlying our extensions are well-defined concepts from higher-order AGs (first-class children), and ordered AGs (visits). Underlying these concepts are well-defined concepts from functional programming languages (first-class functions, coroutines, and referential transparency). These concepts form a solid theoretical basis to build upon. The extensions that we present are not limited to type inference. In fact, type inference is a use case that sets challenges whose solutions improve the abstraction facilities that are available to structure compilers. %% Maybe mention that this thesis sheds some light on: %% AGs and monads %% AGs and imperative languages with side effects %% AGs and dependently typed programming \paragraph{Thesis organization.} Chapter~\tref{chapt.rulercore} gives a detailed summary of the thesis. Subsequent chapters focus on individual extensions, and reintroduce relevant terminology. For background information related to type systems and attribute grammars, Section~\ref{intro.sect.typesystems} and Section~\ref{intro.sect.ags} can be used as reference. \paragraph{Publications.} The chapters of thesis are based on the following publications: \begin{itemize} \item We presented an earlier version of Chapter~\rref{chapt.side-effect} at the Workshop on Generative Technologies (WGT '10) at ETAPS in 2010~\citep{middelkoop09wgt10}. An extended version appeared in the journal of Higher-Order Symbolic Computation~\citep{Middelkoop:2011:VAG:2007516.2008132}. \item Some of the work of Chapter~\rref{chapt.warren} is to appear on PADL'12. \item Chapter~\rref{chapt.iterative} is an extended version of the paper that we presented at the conference on Generative Programming and Component Engineering (GPCE '10) in 2010~\citep{Middelkoop10gpce}. \item An earlier version of the chapter about GADTs (in the extended edition of this thesis) appeared in the post proceedings of Trends in Functional Programming (TFP '08)~\citep{mypaper}, and a later version appeared in the journal of Higher-Order Symbolic Computation~\citep{springerlink:10.1007/s10990-011-9065-0}. \item Chapter~\rref{chapt.dependent-ags} is to appear in the post proceedings of the symposium on Implementation and Application of Functional Languages. \item We presented Chapter~\rref{chapt.breadthfirst} at the workshop on Language Descriptions Tools and Applications (LDTA '11) in 2011. \end{itemize} As a formal detail, the Association for Computing Machinery (ACM) has copyright on the paper version of Chapter~\rref{chapt.iterative} and Chapter~\rref{chapt.breadthfirst}. %% Springer Verlag has copyright on the paper version of Chapter~\rref{chapt.case-study}. Elsevier has copyright on the paper version of Chapter~\rref{chapt.side-effect}.