%include intro.fmt

\chapter{Introduction}
\label{chapt.introduction}

This thesis investigates the application of attribute grammars
to the description of inference algorithms (as implemented in a compiler)
that are specified by a collection of inference rules.
Such collections are a means to specify the semantics of programming
languages. Since there exists no universal inference algorithm for
inference rules, it makes the description of inference algorithms
a nontrivial exercise. Our goal is to use attribute grammars to
make it easier to write such descriptions.

\section{Overview}
\label{intro.sect.motivation}

%% Introduce source code, compiler, and more stuff.
\paragraph{Language specifications.}
A formal programming language specification describes the language's notation and the
notation's meaning, and forms a set of requirements for tools that process programs written
in this language.
A static semantics of a programming language is often expressed as a relation
between properties of programs written in this language, where the
relation is described by a set of inference rules.
The structure of such a program is the Abstract Syntax Tree (AST) obtained
by parsing the source code of the program, and plays usually an important role
in the rules.
For example, in case of code generation, a static semantics
expresses a relation between the AST and machine code, whereas in case of type
checking, the semantics typically relates the AST to a type for each program fragment and
an environment with a type for each identifier.

%% Introduce compilers (implementations)
\paragraph{Implementations.}
A compiler analyzes a program and computes properties of that program.
%% Given a program and a denotational semantics, the question arises if we can
%% implement a compiler.
This analysis is specified by a \emph{static semantics}, which describes a relation
between programs and properties.
For example, the specification of a compiler that translates source code to machine code
describes a relation between a program and machine instructions.
The properties computed by a compiler can be seen as the evidence that there exists
a proof that the properties are related to the program.
Assuming that the relations are specified with inference rules, such a proof has the
form of an attributed tree. Each node represents an application
of an inference rule of the semantics, and attributes describe how the rules are
instantiated.

A semantics is \emph{syntax directed} if the shapes of the proofs are isomorphic to the AST.
A semantics is \emph{algorithmic} if additionally the constraints between attributes in the
tree are expressible as computable functions.
As a compiler's implementation is usually based on traversals of the AST, an
algorithmic semantics is a convenient specification of a compiler.
Since the implementation boils down to
traversing the tree and applying the computable functions as mentioned in the specification
in the right order, there exists a clear correspondence between the specification and the implementation.
An example is a semantics for an assembly language where the relation between the
AST and machine code is actually a bijection.
However, most language's semantics are not entirely algorithmic. We come back to this
point further below.

%% Introduce the attribute grammars
Attribute Grammars (AGs)~\citep{DBLP:journals/mst/Knuth68} are an attractive domain specific
programming language for the implementation of an algorithmic semantics.
Attributes represent inductive properties of the AST.
A context-free grammar describes when an AST is correctly structured by relating productions to nodes in the
AST. An AG describes when an AST is correctly decorated with attributes by imposing constraints on
the attributes per node of the AST, formalized by rules per production.
These rules are computable functions between attributes denoted in some host language.
AGs abstract from the traversal over the AST and from the order of evaluation of the rules.
Therefore, AGs are composable, which makes it possible to describe the implementation of
a large language as various separate aspects.
Moreover, an AG is compilable to an efficient algorithm that
computes the attributes.
Thus, an AG serves both as a specification and an implementation of an algorithmic
semantics.

%% Now come back to abstraction in the specification
%% Give a reason why semantics are declarative
\paragraph{Declarative specifications.}
As argued above, to implement a static semantics, it is preferable that the
semantics is algorithmic because of the close correspondence with an implementation.
A declarative semantics is a semantics suitable for formal reasoning and documentation
purposes, and is usually more abstract and concise than an algorithmic semantics.

For example, consider a static semantics that describes a relation between the AST and a
sequence of machine instructions. When the relation is not functional, there may be many related
sequences of machine instructions for a given AST. In particular, there may be a difference
between a shortest sequence of instructions, or a sequence of instructions with the lowest
expected execution time. The choice of which sequence is computed is left up to the
implementation or specified separately.

Moreover, declarative features of a programming language can usually be more concisely
described relationally instead of being based on a computable function.
An example is operator overloading, where the choice of the implementation of an operator
depends on the types of the operands.
Effectively, the compiler takes care of some of the work of the programmer, and models
the inference of the proof in some way.
%% LEFT OUT: DECLARATIVE FEATURES OF PROGRAMMING LANGUAGES
%% In general, when the relation is not functional, it describes a declarative feature of the
%% programming language. Declarative features allow abstraction from trivial details of the
%% program. As a consequence, the program description is more concise, and allows the
%% compiler to augment or optimize the program.
%% LEFT OUT: AN ALGORITHMIC SEMANTICS MAY NOT BE COMPATIBLE WITH PARTICULAR COMPILER
%% FEATURES
%% ESPECIALLY IN TYPE INFERENCE WE SEE SOME OF THIS BEHAVIOR
%if False
We can see this when we look at type inference.
A typical property of a program is a type. A \emph{type} expresses invariants on the structure of
values.
A \emph{type system} defines a relation between program and types, and thus specifies when a program
preserves the invariants. A \emph{type inference algorithm} describes how to compute a type that
can be proven to be related to the program.
Preferably, the algorithm is sound and complete with respect to the type system, and computes
also the most general related type.
%endif

\paragraph{From ASTs to proofs.}
A transformation from a non-algorithmic semantics to an algorithmic semantics is non-trivial.
An implementation using traversals of the AST is based on an algorithmic semantics.
Consequently, it is difficult to keep the implementation consistent with the
declarative language specification.

In this thesis, we approach the implementation of a static semantics from another direction.
We consider AGs based on the grammar that underlies the inference rules of the relation
instead of the language's grammar. As a particular advantage, the AG is closely related to the
declarative specification.
However, the AG specifies when an attributed proof is valid, but does not specify how to
obtain the proof.
Since there exists no general procedure that maps such an AG to an algorithm that infers the
proof, we need to augment the AG with additional information to obtain such an algorithm.

%% With higher-order AGs we take one step in the right direction, but are still not there yet.
%% Explain HOaGs.


%if False
For complex type systems, the shape of the proof and the types are mutually dependent.
For example, in the presence of impredicative polymorphism~\citep{DBLP:conf/icfp/VytiniotisWJ08},
whether or not it is necessary to
instantiate a polymorphic type may depend on the type, and affects the shape of the proof.
Moreover, the type depends on the shape of the proof.
Inference algorithms therefore make explicit assumptions about the intermediate states of the
proof during the proof's construction.
Since an AG describes only what a valid proof is, it imposes constraints on the final state of
the proof. It is not always clear how such constraints propagates to decisions to be taken given
an intermediate state of the proof\footnote{
  The type of inference algorithms that we wish to describe can be implemented with turing
  machines. With AGs we can indirectly express Turing machines
  (Section~\ref{intro.sect.correspondences}). Such algorithms are thus theoretically expressible.
  Instead, this thesis addresses language design: how to describe such algorithms
  with attribute grammars \emph{concisely}.
}.

To circumvent this issue when using AGs, the non-functional subparts of the proof are typically
represented separately as first-class values and contained in attributes. For example, a substitution
represents the intermediate states of many straightforward equality proofs.
However, since such values are atomic with respect to AGs, it is hard to
combine properties of the partial proofs with attributes of the AG.
For complex type systems, these partial proofs play a more prominent role, and thus are
AGs in their current form not well suited to describe such type systems.
This is unfortunate, given the advantages that AGs offer.
%endif

\paragraph{Thesis.}
In this thesis, we focus on the description of type inference algorithms with attribute grammars.
Most type inference algorithms are a complex combination of a small set of inference
techniques, such as type variables and unification to calculate with values that are not
fully known yet, fixpoint iteration to approximate solutions, constraints to defer the inference
of a subtree, and search tree construction to encode alternative solutions.
Such techniques are based on gradually building a proof and exploring intermediate
candidate solutions.

In our approach, we conservatively extend AGs to support these techniques and
balance between assumptions about the
evaluation algorithm, and the preservation of the declarative nature of the description.
The central concept that underlies our extensions are higher-order, ordered attribute
grammars. In such higher-order AGs, the domain of an attribute can be an attributed tree, which allows
us to dynamically grow the tree.
The state of a tree is described by a configuration, which specifies the decorations
that have been computed. In an ordered AG, the configurations are linearly ordered.
A visit, a unit of evaluation for a node, transitions the state of the node to a state
described by the next configuration.
This concept offers control over the simultaneous evaluation of attributes
and exploration of the tree.

\paragraph{Chapter organization.}
This chapter presents background information and a short outline.
We assume that the reader has a strong background in the programming language
Haskell~\citep{O'Sullivan:2008:RWH:1523280}, is familiar with type systems~\citep{509043},
and knows the basics of attribute grammars~\citep{DBLP:journals/mst/Knuth68}.
This chapter gives (rather) informal definitions of relevant concepts
and provides pointers to literature.

The actual contents of the thesis start with the next chapter, Chapter~\rref{chapt.rulercore}.
Chapter~\rref{chapt.rulercore} gives a detailed summary of our extensions and shows how these fit together.
Each following chapter covers an extension in detail.

We address concepts of type systems and attribute grammars in combination
with notation in Section~\ref{intro.sect.typesystems} and Section~\ref{intro.sect.ags}.
In particular, we recast the notion of ordered attribute grammars.
Section~\ref{intro.sect.ruler} addresses previous work,
Section~\ref{intro.sect.overview} gives an overview
of each of the extensions, and Section~\ref{intro.sect.globalpicture} sketches the overall
goal. Finally, Section~\ref{sect.intro.relatedwork} addresses related work.


%if False
\section{Complexity of Software, Programming Languages and Compilers}
\label{intro.sect.othermotivation}

%% Computer programs manipulate structured values that are encoded as sequences of bits in the memory of
%% a computer during their execution. A program's code thus makes assumptions about the
%% interpretation of these sequences.
%% Such programming errors give rise to the question how these assumptions can be
%% made explicit and verified, preferably before even executing the program.

The motivation for our research has its roots in the
vicious circle of the increasing complexity of software and society's increasing
dependence on software.
As software becomes more complex, the development of computer programs becomes
more tedious. There is an interplay between the complexity of software,
its quality, and its development and maintenance time. The purpose of software engineering
is to improve on these aspects in order
to develop software with more functionality, in less time, and containing fewer errors.
A lot of effort is invested to define more efficient development processes,
to increase the understanding of programmers in concepts such as parallel programming
and security, to improve testing procedures, and last but not least:
to equip the programmer with better tools.
Of these tools, programming languages play a major role in coping with
the complexity of software systems.

Advances in expressiveness of programming languages lead to more concise, more understandable,
more abstract, and more reusable source code.
Using mainstream programming languages, programmers write the equivalent of thousands of machine
instructions with a couple of lines code, and even more so with domain-specific programming
languages.
In the design of these languages, the concept of types plays a major role.
The purpose of types in programming languages has grown beyond specifying
how data is stored in memory. Types are a means to specify verifiable properties
about programs. For example, a desirable property of a program is that the program cannot
misbehave at runtime.
Furthermore, types facilitate type-driven program optimizations and meta
programming, and also auto completion and refactoring in (graphical) editors.
Types lead to better programming languages, which in turn allows us to cope with
more complex software, if we provide sufficient tooling to support programmers in the
use of these typed languages.

A compiler is a tool that points out mistakes in the programmer's code, and takes the burden to
translate the concise human-readable code into optimized machine code. Its implementation
is specified by the language's semantics. A static semantics defines the
correctness criteria, and a static semantics specifies the translation to
machine code. A language thus requires a sensible semantics to be usable in practice.
A type system associates types with the
values computed by the program, and is thus a static semantics in statically-typed languages.

The purpose of a programming language is to facilitate the description of algorithms in its intended domain.
If a language is too restrictive, such algorithms may not be expressible (concisely).
On the other hand, restrictions allow for static checks and optimizations.
For example, some algorithms can be expressed as primitive recursive functions, which
are guaranteed to terminate on finite inputs.
However, not all restrictions may be enforceable by a compiler implementation.
There are hard theoretical boundaries (e.g. as demonstrated by the famous halting
problem) that a compiler cannot cross.
For example, typed programming languages often require programmers to annotate their
programs explicitly with types. These types are then checked by the compiler.
More recent languages allow (under certain conditions) type annotations to be omitted.
The compiler then attempts to infer a suitable type. However, for many type systems,
full type inference is undecidable. The challenge is to delicately restrict
type systems, so that inference is possible for type correct programs, while at the same time
not making the set of type correct programs too small.
This requires boundaries to be stretched further.
Consequently, both type systems, and the implementation of a compiler becomes more complex.

A type system typically describes an inductively defined \emph{relation} (typing relation)
between programs and types.
For a given program, a type inference algorithm proves the existence or absence of an
\emph{admissible type}, which is a type when paired with the program is a member of the
typing relation.
There is thus a relation between the type of the program, and
the abstract syntax tree of the program. This relation is not necessarily a
(computable) \emph{function}. When the relation is not a function, it allows the type system
to abstract from the inference algorithm, and offers the algorithm the freedom to the search for
a proof.
Since type directed optimizations and transformations are defined in terms of the
structure of such a proof, the (type-directed) automation provided by a compiler
depends on the difference between the proof and the abstract syntax tree of the program.

For complex type systems, the typing relation is not a function, which raises the
question how to define a sound and complete inference algorithm.
Moreover, it is desirable that an inference algorithm
computes a proof that meets certain quality criteria. An example of such a criterion is
that a proof is computed for the most general admissible type to obtain better
modularity and predictable inference.
A language designer's challenge is
to design a language such that an inference algorithm exists.
Type systems and accompanying inference algorithms are typically defined for individual,
and orthogonal language features on small languages (typically lambda calculi), but it
is their combination which makes a type systems implementation difficult.

The incorporation of inference algorithms is a non-trivial task, as the algorithms for the individual
language features are cross-cutting. In general, such algorithms exhibit common techniques,
such as breadth-first search, fixpoint computations, and unification with
deference of judgements via constraints. These techniques incrementally construct and
filter candidate proofs. Different interleavings of such techniques may have different effects on
the outcome, which makes it difficult to combine such techniques.
Additionally, compiler implementations face the requirements that a
compiler needs to behave predictably, be reasonably efficient, and give proper error
messages.
All these implementation details together require many design decisions to be made.
It is ridiculed that the implementation of a type inference algorithm can only be
maintained by the programmer that wrote it. This may not be
far from the truth, as there is typically a big gap between the specification
and the actual implementation, which makes the implementation hard to maintain and
extend. Compiler implementations evolve continuously due to language and
algorithm changes, which makes it even harder to keep the specifications and
implementations mutually consistent.
Moreover, inconsistencies between implementation and specification arise because
some type systems are first implemented as some language extension, and only later formalized.

This thesis positions itself in a project with the ultimate goal to generate a type inference
algorithm from a type system specification that is annotated with algorithmic strategies.
Effectively, we wish to transform typing relations
into computable functions, while preserving the original structure of the typing relation.
This approach ensures that both specification and implementation are kept consistent,
provides abstractions for common idioms, facilities rapid prototyping, and aids
reasoning. This approach thus serves as a means to cope with the complexity of the implementation
of compilers.

To reach this goal, we develop a domain-specific programming language
\emph{Ruler}~\citep{DBLP:conf/flops/DijkstraS06}
for the description of such annotated type systems.
Prior work shows how to describe type systems with Ruler, and generate an inference
algorithm when the typing relation is a function of the structure of a program.
In this thesis, we develop a core language for Ruler, \emph{RulerCore}, which unifies the common
techniques that are employed by inference algorithms. In our approach,
(higher-order) attribute grammars (AGs) form the basis of this language.

AGs are an attractive domain-specific programming language to express
syntax-directed translation.
A particular challenge in a compiler's implementation that is addressed by AGs
is the coordination of the flow of information (environments, types, code) over the
abstract syntax tree (AST) representation of a program.
An AG associates attributes with each production in the grammar of
a language, and rules that defines attributes functionally in terms of other
attributes.
In an AG, we capture the information as attributes, and the flow as rules.
Such descriptions are composable. We can easily add more attributes and rules, and
refer to existing attributes.

Composable descriptions, as offered by AGs, are important because the implementation of a
type system in a compiler involves more than the type system.
A type system specifies only the
constraints imposed on types, whereas a compiler uses the types to produce
error messages and optimized code. Attributes thus allow an exchange of information
between type inference and code generation.
Such descriptions are also concise. We mechanically
obtain an algorithm for the evaluation of rules at each node of the AST, so that
we do not have to concern ourselves with programming the actual traversals over the
AST.

AGs in the traditional sense assume the AST to be present a priori and
describe the decoration of the AST with attributes. A type inference algorithm,
however, describes the simultaneous construction and filtering of one or more candidate proofs.
A direct and 1-1 correspondence between an AG and an inference algorithm can thus be made when the
structure of the proof coincides with the abstract syntax tree, and the object terms
in the proof coincide with attributes.
Since this is in general not the case, RulerCore extends
the AG formalism to allow such algorithms to be described.
%endif


\section{Background on Type Systems}
\label{intro.sect.typesystems}

%% A potential difficulty is here that these subjects are covering multiple disciplines
In later chapters, we use type systems based on variants of the lambda calculus as example.
In this section, we give a short summary on the lambda calculus, show the evaluation of lambda
terms, and give a type system.
Furthermore, the discussion of variants of the lambda calculus and their type systems serves as
a vehicle to discuss design and implementation challenges of type systems, which we use in the
motivation of this thesis in Section~\ref{intro.sect.overview}.
Another purpose of this section is to introduce vocabulary and notation
for subsequent chapters.

We assume that the reader is already familiar
with the lambda calculus and type systems. Introductory books
on type systems~\citep{509043,harper} provide a more extensive and formal explanation.


\subsection{Specification of Programming Languages}
Programming languages are described by a grammar since a grammar specifies the set of programs
that belong to the language. A semantics relates a program to some properties in a given domain.
When these properties represent the correctness of the program, or
represent a program in (another) programming language, we talk about a static or denotational
semantics. When the properties describe the runtime behavior of the program (typically in the form
of state transitions), we talk about a dynamic or operational semantics.

We assume that the specification of a programming language consists of a
context-free grammar (Section~\ref{intro.ag.syntax}) and static semantics.
The specified programming language is called the \emph{object language}, and a program an
\emph{object term}. In case of compilation and transformation, the object language is often
referred to as the \emph{source language} and object terms as \emph{source terms}.
The language that describes the specification and the language in which the compiler is implemented
are called \emph{meta languages}. Finally, in case of a translation to a different language,
the language where we translate to is called the \emph{target language} and the translated
program a \emph{target term}.

\subsection{The Lambda Calculus}
\label{intro.sect.lambdacalculus}

The lambda calculus is a language that is often used in programming language
research, and in research on type systems in particular. Many concepts
of programming languages have their roots in variants of the lambda
calculus, or have been well-studied in such a context.
We discuss the lambda calculus, because we use its concepts in object languages,
source languages and target languages in the following chapters.

Figure~\ref{fig.intro.lambda} shows the abstract syntax |e| of expressions in an explicitly-typed variant of the
simply-typed lambda calculus, which may contain types |tau|.
In passing, we also give syntax for environments and evaluation contexts.
The structure of |e| is called the Abstract Syntax Tree
when represented as a tree (Section~\ref{intro.ag.syntax}).

\begin{figure}[tb]
%format ty = "\tau{}"
%format gam = "\Gamma{}"
%format . = "."
%format emptyset = "\emptyset{}"
%format ty1
%format ty2
%format ty3
%format Ctx = "\mathcal{E}"
%format (InCtx (x)) = Ctx "\left\{" x "\right\}"
%format (InHole (x)) = "\left\{" x "\right\}"
\begin{code}
  e        ::=  e1 e2           -- application, with expression |e1| and |e2|
           |    x               -- variables, e.g. |x,y,z|
           |    v               -- values (head-normal form)

  v        ::=  i               -- integer constant, e.g. |3| and |42|
           |    \ x : ty . e    -- abstraction, with param |x| of type |ty| and body |e|

  ty       ::=  I               -- integer type
           |    ty1 -> ty2      -- function type

  gam      ::=  emptyset        -- empty environment
           |    gam, x : ty     -- environment |gam|, with on top a mapping of |x| to type |ty|

  x, y, z                       -- variables
  f, g, h, a, e                 -- expressions
\end{code}
\caption{Syntax of the explicitly-typed lambda calculus variant.}
\label{fig.intro.lambda}
\end{figure}

In the lambda calculus, a function may be passed as a value |v| in
the form of a lambda expression, and thus is \emph{first class}.
Such a function is anonymous, although the function can be
given an explicit name by binding the function to an identifier.
For example, the following expression denotes the application of a function that
applies the identity function that it receives as an argument to the value |3|:
\begin{code}
  (\ y : I -> I ^^ . ^^ y 3) (\ x : I ^^ . ^^ x)  -- evaluates to |3| of type |I|
\end{code}

%format ~> = "\leadsto{}"
To specify to what value an expression evaluates, we provide below an
operational semantics. In this semantics, the simultaneous substitution of
all free |x| in |e1| by |e2| is denoted by |[x := e2] e1|.
The semantics\footnote{
  Such a semantics is called a small-step operational semantics because the
  inference rules describe a transformation step from an intermediate term to
  another intermediate term, and the actual transformation is the exhaustive
  application of these transformation steps.
} consists of the reduction relation |e1 ~> e2|, which is
the smallest relation that satisfies the following inference rules\footnote{
  In Section~\ref{intro.sect.typerules} we will actually consider a notation
  for inference rules and their interpretation.
} of Figure~\ref{fig.intro.reduction},
which we explain below.

\begin{figure}[tb]
\begin{mathpar}
  \fbox{|e1 ~> e2|}\\

  \inferrule*[right=beta]
    {}
    {|(\ x : ty ^^ . ^^ e1) e2  ~> [x := e2] e1|}

  \inferrule*[right=left]
    {|e1 ~> e2|}
    {|e1 e ~> e2 e|}
\end{mathpar}
\caption{Operational semantics as a reduction relation on expressions |e|.}
\label{fig.intro.reduction}
\end{figure}

A \emph{beta-redex} is an expression of
the form  |(\x . e) a| which can be reduced.
The above rules describe normal-order reduction. Through the rule \TirName{left} the
beta-redex in the head position is identified.
Indeed, if we consider the reduction of the above example, we end up first
substituting |f| with the identity function, such that we obtain |(\ x:I ^^ . ^^ x) 3|, and then
substitute |3| for |x|, such that we end up with the value |3|.
When no reductions are possible anymore, the expression is in \emph{head normal form}.

\subsection{Type Rules}
\label{intro.sect.typerules}

The purpose of a static semantics is to exclude programs that incorrectly use their data.
%% When a program is well-formed, then the result of a terminating evaluation is in normal form.
%% When a program is well-typed, then the result of a terminating evaluation has the same type
A type system classifies expressions as well-typed if it can associate
a type with it. A type system is sound if it has the subject reduction property, which means
that after each reduction step, the resulting term has the same type
as the original expression.
Type systems typically ensure the absence of certain programming errors, such as passing an integer where a function is expected.

%format Gamma = "\Gamma{}"
As an example, we give a type system specification for the lambda calculus as defined above
in Figure~\ref{fig.intro.expltyperules} and explain it below.
We allow liberal syntactic sugar in our specifications. Depending on the context, the notation
|many x| represents a sequence or set |x1, ..., (sub x n)|.

\begin{figure}[tb]
\begin{code}
  r  ::=  forall (many x) ^^ . ^^ d                                        -- quantified rule over meta variables |many x|
  d  ::=  j1 ^^ ... ^^ (sub j n) +:+ j                                     -- rule with premisses |(many j)|, conclusion |j|, |n >= 0|
  j  ::=  (sup R n) (many m)                                               -- judgment with |abs (many m) = n|
  (sup R n)                                                                -- |n|-place relation |R| (|n| often omitted)
  m                                                                        -- argument (meta expression)
  x, y                                                                     -- meta variables for values in the object language
\end{code}
The notation |(inferrule (j1 ^^ ... ^^ (sub j n)) j)| (with optional rule label) is sugar for |j1 ^^ ... ^^ (sub j n) +:+ j|.
The explicit equality |x1 ~~ x2| denotes a judgment |subsup Eq tau 2 x1 x2|,
with relation |sub Eq tau| representing structural equality of object terms of meta-type |tau|.
The quantification of meta variables is usually left implicit.
\caption{Notation for inference rules.}
\label{fig.intro.rules}
\end{figure}

We define a typing relation |gam :- e : ty|, which is notation for a three-place relation over |gam|, |e|, and |ty|.
A judgment of this relation states that in environment
|gam|, the expression |e| has type |ty|. We typically describe such relations with inference rules (type rules).
Figure~\ref{fig.intro.rules} gives the syntax of inference rules.
A type rule |r| consists of zero or more premisses (judgements |(many j)|) above the line, and a conclusion
|j| below the line.
We come back to meta expressions |m| later, but it contains at least the meta variables |x|.


\begin{figure}[tb]
%format :- = "\vdash{}"
\begin{mathpar}
  \fbox{|gam :- e : ty|}\\

  \inferrule*[right=con]
    {}
    {|gam :- i : I|}

  \inferrule*[right=var]
    {|(x : ty) `elem` gam|}
    {|gam :- x : ty|}

  \inferrule*[right=app]
    {|gam :- f : ty1 -> ty2|\\\\
     |gam :- a : ty1|}
    {|gam :- f a : ty2|}

  \inferrule*[right=lam]
    {|gam, x : ty1 :- e : ty2|}
    {|gam :- \ (x : ty1) . e : ty1 -> ty2|}
\end{mathpar}
\caption{Type rules of the explicitly typed lambda calculus variant.}
\label{fig.intro.expltyperules}
\end{figure}

Figure~\ref{fig.intro.expltyperules} shows the rules of the lambda calculus as introduced above.
The rule \TirName{con} associates the type |I| to any integer constant. The \TirName{var} rule
associates the type to the identifier as it is bound to that identifier in the environment. The \TirName{app} rule requires
expression |f| to be a function that takes an argument of the same type as its formal parameter.
The type associated with the application itself is the result type of the function. In
\TirName{lam}, the body of the lambda may assume a type |ty1| associated with identifier |x| in the environment.

The typing relation is the smallest relation that satisfies a set of rules |r|, which serve as
axioms of the relation.
In a judgment, meta expressions |m| are arguments to some relation |R|.
The language of meta expressions is a formal language,
which is usually left implicit, but facilitates the construction of symbolic and concrete object terms.

A Relation |R| can be a built-in relation (an \emph{atomic} relation) or relation described by inference rules.
Instead of using expressions |m| to construct symbolic terms,
atomic relations for each production in the object language can equivalently be used to constrain
symbolic object terms.
For example, for the arrow-production in the syntax of types, we assume the existence of an atomic
relation |sub(C)(->) tau1 tau2 tau3| that expresses |tau1 == tau2 -> tau3|.
Then, instead of a judgement |R (tau2 -> tau3)|, we may write |R tau1| with a fresh |tau1| and
|(sub(C)(->)) tau1 tau2 tau3|.

The rules are in canonical form when the arguments of relations in the judgments consist only of meta variables,
with the single exception of the judgment denoted |m (sup (~>) (*)) x| which represents a reduction relation on
object terms where |m| is a meta expression.
Additionally, each meta variable occurs at most once, not counting its occurrences in (additional)
explicit equalities.
We assume in this thesis that meta expressions are written in Haskell. For reasons of simplification
we assume below that meta expressions in this chapter are written in the above lambda calculus so
that the definitions of |m| and |e| coincide.
To rewrite rules into canonical form we use the operational semantics of (meta) expressions.
For example, we
replace a judgment |R e| to |R x| and introduce the additional premise that |e (sup (~>) (*)) x|, where |x| is fresh and
|sup (~>) (*)| is the exhaustive application of the reduction relation on (meta) expressions.

To deal with the occurrences of meta variables, we introduce additional explicit equalities and quantified fresh variables.
Rules in canonical form are more verbose, but easier to formally reason with.

Given arguments |(many a)| for the typing relation |R| described by inference rules,
we can prove that these are a member of the typing relation,
denoted by |R (many a)| or |(many a) `elem` R|, by constructing a \emph{derivation tree} using the rules of the
relation. A derivation tree is a proof for |R (many a)| when there is a rule of |R| with a
conclusion that matches against |R (many a)| (with substitution $\theta$), and for each premise of the rule
(with $\theta{}$ applied), there is subtree that is a proof for that premise.
Proofs of atomic relations are leafs of a derivation.
A derivation tree for the earlier example is:
%format gam0
%format gam1
%format gam2
\begin{mathpar}
  \inferrule*[right=app]
    { \inferrule*[left=lam]
        {
          \inferrule*[right=app]
            {
              \inferrule*[left=var]
                {|(y:I->I) `elem` gam1|}
                {|gam1 :- y : I->I|} \\
              \inferrule*[right=con]
                {}
                {|gam1 :- 3 : I|}
            }
            {|gam1 :- y 3 : I|}
        }
        { |gam0 :- \ y:I->I ^^ . ^^ f 3 : (I->I)->I| } \\
      \inferrule*[right=lam]
        { \inferrule*[right=var]
            { |(x:I) `elem` gam2| }
            { |gam2 :- x : I| }
        }
        { |gam0 :- \ (x:I). x : I->I| }
    }
    {|gam0 :- (\ y:I->I ^^ . ^^ f 3) (\ x:I ^^ . x) : I|}
\end{mathpar}
\begin{code}
  gam0  =  emptyset           -- initial environment
  gam1  =  gam0, y:I->I       -- extension of |gam0| in the application of \TirName{lam} (left branch)
  gam2  =  gam0, x:I          -- extension of |gam0| in the application of \TirName{lam} (right branch)
\end{code}
Each node |v| in the derivation tree is associated with a judgment |sub j v = (sub R v) (many a)|,
and the subtree rooted by |v| is a proof that |many a `elem` (sub R v)|, where |sub R v| is the associated relation.
Furthermore, node |v| is associated with some rule |sub r v|, which determines the structure of |v|.
The node is furthermore decorated with values for the meta variables that are bound by |sub r v|.

Type rules are \emph{syntax-directed} when in derivation trees, for each
node |v|, the associated rule |sub r v| is uniquely determined by an argument |a| (which
represent the expression for which we try to construct the proof)
of the associated judgment |sub j v|.
Thus, for syntax directed rules, the choice of what rule
to apply in the construction of the proof is determined uniquely by productions of the
object language, and there is a one-to-one mapping between nodes in the abstract syntax tree
and nodes of the derivation tree.
This is a desirable property because it means that the
shape of the derivation tree depends only on the structure of the object term, and is thus
given at the start of the proof construction: we know what rules to apply where, and are only
left with the problem of how to make their instances match.

However, when the correspondence between |sub r v| and the a priori known arguments
|many i `subset` many a| of |sub j v| is not functional, we call the rules \emph{declarative}\footnote{
  Some authors call declarative rules \emph{nondeterministic} because an inference algorithm
  may need to choose nondeterministically what rule to apply.
}.
Additionally, when |sub r v| also depends on the inferred type |tau|, with
|tau `elem` many a|, the rules are declarative and also \emph{type directed}.

A rule |sub r v| imposes two forms of constraints on node |v| in the derivation tree. Constraints on
the structure (as discussed above), and constraints on the values for the node's meta variables that
decorate |v| (as discussed below).
When constructing a proof, we choose rules and values that satisfy these constraints.
We call the description of such choices \emph{aspects} of the rules.
These aspects are declarative when the decisions are not functionally determined by |many i|.
Declarative aspects complicate the construction of a proof, because there may be many choices
that seem suitable to complement a partial proof, but turn out later to be inappropriate.
In Section~\ref{intro.sect.inference}, we discuss strategies to resolve
such declarative aspects.

We determine which meta variables are declaratively defined with a transformation of the relations into
functions: those meta variables that make the function non-deterministic are declaratively defined.
For that, each parameter of a relation must have a specification
that declares it (conditionally) as either an input or output.
In a judgement |j = R (many x)| of a rule in canonical form, a meta variable |x `elem` (many x)| is
at an input position when |j| is a premise and |x| is passed as output argument, or when |j| is
a conclusion and |x| is passed as an input argument.
Otherwise, a meta variable is at an output position.
In case of explicit equalities |x1 ~~ x2|, |x1| and |x2| are both at an input position.
In case of the reduction relation |m ~> x|, the occurrences of meta variables of |m| are at an
input position and |x| is at an output position.

In case of the construction/destruction relation |sub C P| related to the production |P| of the object
language, in the judgment |sub C P x0 x1 ... (sub x n)|, |x1, ..., (sub x n)| are at an input position
if and only if |x0| is at an input position.

A variable is \emph{declarative} if none of its occurrences are on an output position and the
variable does not occur in an explicit equality with a non-declarative variable.

As an example, we add a type rule for an implicitly typed lambda abstraction, and discuss
which meta variables are declarative.
As preparation we show the canonical forms \TirName{lam-impl'} and \TirName{app'} in
Figure~\ref{fig.intro.canonical}.
When we assume
that types are an output of the relation, the meta variable |ty1| of \TirName{lam-impl} is declarative.
When we assume that types are an input of the relation, the meta variable |ty1| of \TirName{app} is
declarative.

\begin{figure}[tb]
\begin{mathpar}
\inferrule*[right=lam-impl]
    {|gam, x : ty1 :- e : ty2|}
    {|gam :- \x . e : ty1 -> ty2|}

\inferrule*[right=app]
    {|gam :- f : ty1 -> ty2|\\
     |gam :- a : ty1|}
    {|gam :- f a : ty2|}

\inferrule*[right=lam-impl']
    {|gam, x : ty1 :- e : ty2| \\\\
     |sub C (->) m2 ty1 ty2| \\
     |sub C App m1 x e| \\
    }
    {|gam :- m1 : m2|}

\inferrule*[right=app']
    {|gam :- f : m2|\\
     |gam :- a : ty1| \\\\
     |sub C (->) m2 ty1 ty2|\\
     |sub C App m1 f a| \\
    }
    {|gam :- m1 : ty2|}
\end{mathpar}
\caption{Rules \TirName{lam-impl} and \TirName{app}, and their respective canonical form.}
\label{fig.intro.canonical}
\end{figure}

The above sketch of an analysis assumes that the atomic relations are computable functions.
We call type rules declarative if they exhibit declarative aspects, and
\emph{algorithmic} otherwise. In the latter case, when the rules are
described in Ruler (Section~\ref{intro.sect.ruler}), a type inference
algorithm can be generated.


\subsection{Type Inference}
\label{intro.sect.inference}

A type checking or type inferencing algorithm concerns itself with constructing derivation
trees when given the main judgment.
In case of type checking the type is part of these a priori known arguments. In case of
type inference the type is not, and actually inferred.
This distinction between type checking and type inference is rather vague, because a type system may
exhibit other declarative aspects than the definition of types, and thus implies some form of
inference.
Moreover, for a compiler typically the inferred arguments matter but the derivation tree is not
of direct interest.

\paragraph{Inference as a forest of derivation trees.}
Some authors consider type inference as a two-step process: a traversal of the AST to generate
a set of constraints, and solving this set of constraints. We shall, however, consider type inference
as the incremental construction of a forest of derivation trees, which has a closer connection to
declarative specifications.
Its intermediate state is a forest of partial
derivation trees. It starts with a singleton forest containing an empty derivation tree with
the main judgment as pending aspect.
A derivation tree in the forest has an associated status, which is that it is partial, complete, or
unsatisfiable.

A reduction step in this forest consists of cloning an existing partial derivation and adding the
cloned derivation to the forest after resolving one pending aspect of the clone such that the
number of unique derivation trees in the forest increases.

An aspect is either a meta variable or a judgment. To resolve the former, the meta variable is bound
to a value in its domain. To resolve the latter, it depends on whether the relation of the judgment
is described by rules or an atomic relation. If the relation is described by rules, either a
rule can be applied which leads to a larger or a complete derivation tree, or the derivation tree
is unsatisfiable. If the relation is atomic, then the aspect is solved by running an algorithm
that is associated to the relation which either succeeds or fails.

The existence of a complete derivation means that the source program is type correct.
When all reductions have been applied exhaustively and there exists no complete derivation the
program is type incorrect. However, there may be infinitely many partial derivations.
Also, there may be infinitely many partial derivations of a certain height, for example, when
the domain of a meta variable is infinite.
We come back to this issue later: in practice, an inference algorithm uses a less general approach.

\begin{figure}[tb]
%format subst = "\theta{}"
%format subst0
%format subst1
%format subst2
%format subst3
%format subst4
%format :~: = "\diamond{}"
%format :>  = "\triangleright{}"
\begin{code}
prove :: j -> Inf Tree                    -- |Inf| offers unification and backtracking
prove (x ~~ y) = do                       -- case for explicit equality
  unify x y                               -- equality mapped to unification
  return (x ~ y)                          -- builds equality node
prove (R (many x))                        -- case for judgments
  | atomic R   = extern R (many x)        -- proven externally
  | otherwise  = do                       -- relation specified by rules
      r           <- rules R              -- picks a rule (backtrack point)
      (ps +:+ c)  <- instantiate r        -- instantiate rule
      zipWithM unify (args c) (many x)    -- bind args
      ts <- mapM prove ps                 -- recursion on premises
      return (ts (sub(:>)(name r)) c)     -- build derivation node
\end{code}
\caption{Sketch of a general inference algorithm.}
\label{fig.intro.prove}
\end{figure}

\paragraph{Naive algorithm.}
The algorithm in Figure~\ref{fig.intro.prove} represents a typical Prolog-style strategy for the
construction of derivation trees. We explain some aspects of the example below.
In this example, we refer to an inference monad |Inf| which takes care of backtracking and unification.
Rules are tried in a predefined order. If a rule cannot be applied, \emph{backtracking} occurs
to the next rule.

Meta variables are initially represented as symbolic values, and unified with other meta
variables or object terms during inference.
Unification is derived from an equivalence relation, which can in turn be derived generically
from algebraic data type declarations.

The operation |rules| introduces a backtrack-point. It tries the rules in a given order
by feeding the rules one by one to the continuation.
A later unification may fail and cause a backtracking to that point.
The operation |instantiate| substitutes the quantified meta variables with fresh
meta variables. The instruction |extern| proves a judgment externally and returns
evidence for it if it succeeds. With combinators |:>| (internal nodes) and |~| (equality leafs)
we construct evidence in the form of a derivation tree.

\paragraph{Undesirable properties of the algorithm.}
The above algorithm is incomplete and does not terminate for all but the simplest type systems.
This is necessarily the case because there exist type systems for which inference is undecidable.
Via backtracking, only finite and inductively defined object terms can be inferred, and
unifications only produce compositions of object terms.
The algorithm is also inefficient. The order in which rules are tried may cause
poor performance or nontermination.
Moreover, the order in which rules are tried is not specified, which gives unpredictable
results. Although the results are sound, these may not be optimal.

In practice, typical algorithms refrain from backtracking or constructing
many candidate derivations, because the search space is too large when dealing
with ASTs with thousands of nodes.
Therefore, actual inference algorithms resemble this overall approach, but select
rules and instantiations of meta variables in a more sophisticated way.

\paragraph{Actual inference algorithms.}
Actual inference algorithms apply various strategies (depending on the form of the
declaratie aspect) to get around the above undesirable properties.

Unification can be used to deal with equivalence judgments in type rules.
Alternatively, an algorithm may collect several candidate constraints on a meta variable, then
pick an object term that is the least solution to all these constraints.
When constraints are monotonic, a value can gradually be approximated
via fixpoint iteration with an initial bottom value. Type and effect systems
often require such algorithms.

When rule selection is declarative, we typically want the choice to be a function of the
syntax of the language (\emph{syntax directed} rules) or other object terms (e.g. \emph{type directed} rules).
Alternatively, some rules can be restricted to only be applied in a proof after other rules
have been applied so that the choices between the remaining rules becomes functional in the above
sense. This is not always possible: the applicability of a rule may depend on unresolved meta variables
and may require the rule selection
to be deferred which is called residuation~\cite{DBLP:journals/jlp/Hanus94}.

\paragraph{Challenge: orchestration of strategies.}
The orchestration of such strategies is a complex undertaking.
The order in which strategies are applied may influence the result, and it may not always be
clear when to start or stop applying strategies. These are all challenges an implementation must
deal with.

\paragraph{Challenge: annotations.}
The holy grail of type system research is to define expressive type systems
(for a class of programs) that have sound and complete (decidable) inference algorithms.
Given two type systems (that satisfy type-soundness with respect to the operational semantics
of the language), one type system is more expressive than another type system if accepts
a superset of the programs that the other accepts.

To bypass the strict undecidability boundaries, many languages allow programmers to assist the
inference progress by providing additional information (e.g. type annotations) in the object
program that translates to concrete bindings for otherwise declarative meta variables.
There are delicate balances between expressiveness of the type system, the amount of annotation
to be provided by the programmer, and the predictability of inference.

For example, the Damas-Hindley-Milner type system and accompanying inference algorithm
(Section~\ref{sect.intro.dhm}) does not require any annotations to help the inference
process, but disallows functions with parameters of a polymorphic type.
System F, the polymorphic lambda calculus, in comparison allows funtions with a polymorphic
type, but requires an abundance of type annotations.
As a middle way, HML~\citep{DBLP:conf/popl/Leijen09}
expresses all of System F, but requires only type annotations for polymorphic lambda
parameters.
FPH~\citep{DBLP:conf/icfp/VytiniotisWJ08} positions itself in between DHM and HML. It requires
a type annotation when applying a function to an argument with a polymorhic type.
\citet{UUCS2006051} proposed a global flow analysis that propagates type annotations to locations
where (a part of) the annotation is also applicable.

\paragraph{Challenge: type errors.}
As stated before, type inference concerns itself with the construction of derivation trees.
If no such derivation exists, it is considered a type error. Type rules only specify under
what condition a type is acceptable. Thus,
an implementation may require additional information to produce understandable error messages~\citep{DBLP:conf/icfp/HeerenHS03}.

\paragraph{Challenge: multiple derivation trees.}
A partial order on types specifies when one type is more
general than another type.
A type is \emph{principal} when it is the most general type of an expression.
For reasons of predictability and modularity it is desirable that a principal type exists for
an expression and that an algorithm infers a derivation for this type.
This notion of principality can be extended to derivations~\citep{Jim:1996:PTT:237721.237728}.

For the above type rules, there are infinitely many derivation trees
for (|\x . x|), but none of the accompanying types (e.g. |tau -> tau| for any type |tau|)
are comparable, thus this expression has no principal type (thus also not a principal derivation)
for the given type system.

There are several remedies. The type system can be changed such
that more type annotations need to be given, or that choices can be deferred by encoding
pending choices in the type language as constraints. Polymorphism and qualified types are
an example of the latter: type schemes are introduced to represent a delayed choice of
types, as we show in the next section.

\subsection{Parametric Polymorphism}
\label{sect.intro.polymorphism}

For the identity-function |\x . x|, there are no constraints on the type of |x|. During inference, no binding
arises for the meta variable associated to |x|. We can thus bind a fresh type constant |alpha| to the
meta variable, which leads to the type |alpha -> alpha| for |\x . x|. To specify that |alpha| can be
any type, we universally quantify over |alpha|, and obtain the type
|forall alpha . alpha -> alpha|.
This process is called abstraction or \emph{generalization}.
A polymorphic type represents many types which can be obtained by \emph{instantiating} the quantified
type constants (\emph{type variables}). Moreover, we obtain a proof for the instantiated type by
instantiating the proof for the generalized type in an analogous way.

Polymorphic types enable \emph{parametric polymorphism}, which allows functions to be called
with parameters of different but acceptable types. Parametric polymorphism is important for the
use of Haskell's many convenient higher-order functions, such as |id|, |flip| and |$$|.

\paragraph{Quantified types.}
The \emph{Damas-Hindley-Milner} (DHM) type system serves as the classical example of a type system
with polymorphic types.
It forms the basis of the type systems of ML and Haskell, and underlies the type systems of many
other languages. DHM's type language is an extension to that of the simply typed language calculus.
It additionally contains type variables |alpha| and poly types (\emph{type schemes}) |sigma|:
\begin{code}
  tau    ::=  alpha                  -- type variable
         |    tau1 -> tau2           -- function type

  sigma  ::=  forall alpha . sigma   -- poly type: may have a qualifier
         |    tau                    -- mono type: has no quantifiers

  alpha, beta                        -- type variables
\end{code}
A mono type |tau| does not contain universal quantifiers.
A poly type can be interpreted as the infinite set of mono types, where each
quantified type is substituted by some mono type.
The identity function |\x . x| has the type |ty -> ty| for any type |ty|. It
can thus be given the poly type |forall alpha . alpha -> alpha|.
The type variable |alpha| in such a type is an object term that is not to be confused with a meta variable.

The distinction between mono and poly types is important in the declarative type rules of DHM.
Lambda abstractions take mono types as parameter\footnote{
  It has been shown that the inference of a polymorphic type for a parameter of a
  recursive function is undecidable in general~\citep{Wells1999111}. Many type systems thus impose restrictions
  on the types of function parameters to make inference feasible.
}, and have a mono type as result.
Polymorphic types are introduced in the environment by (non-recursive) let-expressions:
\begin{code}
e  ::=  ...
   |    let x = e1 in e2   -- generalized type of |e1| is visible as |x| in |e2|
\end{code}
Although a let-expression can be interpreted as |(\x . e2) e1|, the typing derivation may differ
depending on the presence and application of a generalization rule.

\begin{figure}[tb]
\begin{mathpar}
  \fbox{|gam :- e : sigma|}\\
  \inferrule*[right=var]
    {|(x : sigma) `elem` gam|}
    {|gam :- x : sigma|}

  \inferrule*[right=app]
    {|gam :- f : tau1 -> tau2|\\\\
     |gam :- a : tau1|}
    {|gam :- f a : tau2|}

  \inferrule*[right=lam]
    {|gam, x : tau1 :- e : tau2|}
    {|gam :- \x . e : tau1 -> tau2|}

  \inferrule*[right=let]
    {|gam :- e : sigma| \\\\
     |gam, x : sigma :- b : ty |
    }
    {|gam :- let x = e in b : ty|}

  \inferrule*[right=gen]
    {|gam :- e : sigma| \\\\
     |alpha `notelem` ftv gam|
    }
    {|gam :- e : forall alpha . sigma|}

  \inferrule*[right=inst]
    { |gam :- e : forall alpha . sigma|
    }
    {|gam :- e : sigma [alpha := tau]|}
\end{mathpar}
\caption{The DHM type system.}
\label{fig.intro.dhm}
\end{figure}

The DHM type system in Figure~\ref{fig.intro.dhm} consists of the following rules of which we mention some details below.
Via one or more applications of the \TirName{inst} rule,
poly types may be instantiated to mono types by replacing a bound type variable with a mono
type. Conversely, via rule \TirName{gen}
a poly type can be constructed using quantification over a type variable, provided that the
substitution does not capture
the free type variables in the environment.

\paragraph{Qualified types.}
In general, meta variables may be constrained by judgments that by themselves are insufficient to
bind a concrete type to a meta variable. As an example, the following type rules encode overloading of the
addition operator on both integers and floats.
\begin{mathpar}
\inferrule*[right=add.int]
  {}
  {|gam :- _ + _ ^^ : I -> I -> I|}

\inferrule*[right=add.float]
  {}
  {|gam :- _ + _ ^^ : F -> F -> F|}
\end{mathpar}
In the expression |\x . x + x|, the type rules for addition constrain |x| to be a numeric
type, but do not dictate whether this type is an integer or a float.
At other locations in the derivation tree there may be constraints imposed on the type that make
it clear which of the rules applies.

During type inference, the type may be insufficiently constrained to resolve the judgment.
A typical strategy is to defer the judgment until the end of the inference of the scope in which
the judgment arose, which is usually at generalization points.
If the type is still not constrained sufficiently, a strategy is to \emph{default}
to one of the applicable rules. Another strategy is to encode the judgment as part of
the type (if potentially satisfiable) and delay the judgment to all locations where the
generalized type is instantiated. Such an encoded judgment is called a qualifier.

A qualified type of the expression |\x . x + x| given the above rules is:
\begin{code}
(\x . x + x)  ::  forall alpha beta . exists gamma . (gamma :- _ + _ ^^ : alpha -> alpha -> beta) => alpha -> beta
\end{code}
With additional information this type can be refined. An equivalence between |beta| and |alpha|
is deducable from the rules
\TirName{add.int} and \TirName{add.float}. Also, suppose that we know further that a qualifier
|gamma :- _ + _ ^^ : alpha -> alpha -> alpha| is simplifyable to a qualifier |Num alpha|, the
type is refineable to:
\begin{code}
(\x . x + x)  ::  forall a . Num a => a -> a
\end{code}
Haskell's overloading with type classes actually brings such reasoning under the control of the
programmer.

Such extensions to a type system do not come for free. When the code generation depends on the
proof of a deferred judgment, the generated code needs to be parametrized by information that
is derived from the deferred proof.
Moreover, a function may be given a type with qualifiers that can never be satisfied, which we may
only find out when we try to use an identifier with such a type.
Some type systems define a coherence relation on qualifiers to formalize the potential satisfaction
of constraints.

\subsection{Damas-Hindley-Milner Inference}
\label{sect.intro.dhm}

In this section we consider type inference for the DHM type system.
The naive backtracking approach as presented earlier may easily lead
to nontermination, because the \TirName{inst} and \TirName{gen} rule
can be alternated as each others inverse indefinitely. However, the syntactical restrictions
on types permits a more appropriate inference strategy.

With some effort, we can deduce that generalization has only an effect
for the toplevel expression, or for the expression |e| in a
let-expression. It is also sufficient to perform instantiation only
after taking the type of an identifier from the environment in the rule \TirName{var}.
According to the interpretation of poly types, a type quantified over a variable
that does not occur free in the type describes actually the same set of mono types as the
type without the quantification, thus we only need to consider types that occur free
in the type as generalization candidates. Moreover, the order of the type variables over which is
quantified is irrelevant.

%format inferw = "\mathcal{W}"
Inference algorithm W exploits these properties. It is a sound and complete implementation
of the rules, produces most general types, and needs to examine each node of the AST only once.
Many actual inference algorithms are based on algorithm W.
We describe algorithm W as a monadic function |inferw :: Expr -> Inf Ty| in Figure~\ref{fig.intro.algorithmw}.

\begin{figure}[tb]
In the description of algorithm W we assume monadic versions of
the above functions to form an inference monad |Inf| for the implementation of the
DHM type system. Essentially, these monadic functions
are wrappers around the above functions:
%format ifresh = "\Varid{fresh}_\Varid{I}"
%format iunify = "\Varid{unify}_\Varid{I}"
%format igen   = "\Varid{gen}_\Varid{I}"
%format iinst  = "\Varid{inst}_\Varid{I}"
\begin{code}
type Inf = RWST Env Subst Errs        -- reader, writer, and state monad

ifresh  ::                Inf Ty      -- returns a fresh type
iunify  :: Ty  -> Ty  ->  Inf ()      -- unifies two types
igen    :: Ty         ->  Inf Scheme  -- generalizes a type to a scheme
iinst   :: Scheme     ->  Inf Ty      -- instantiates a scheme to a type
\end{code}
The threading of the substitution and collection of error messages is hidden in the monad,
as well as the top-down distribution of the environment, so that we define
algorithm W as a function |inferw :: Expr -> Inf Ty|:

\begin{multicols}{2}
\begin{code}
inferw ^^ <% x %>       = asks (lookup x) >>= iinst

inferw ^^ <% f a %>     = do
  r  <- ifresh
  t  <- inferw f
  s  <- inferw a
  iunify t <% s -> r %>
  return r
\end{code}
\columnbreak
\begin{code}
inferw ^^ <% \x . e %>  = do
  t  <- ifresh
  r  <- local (insert x t) (inferw e)
  return <% t -> r %>

inferw ^^ <% let x = e in b %> = do
  t  <- inferw e >>= igen
  local (insert x t) (inferw b)
\end{code}
\end{multicols}
\vspace{-0.6cm}
The notation |<% e %>| represents the abstract syntax of an object term |e|. We use this
notation to conveniently pattern match and construct object terms in the host language.
Note that |<% x %>| is an AST of the type |Expr| and |x| is an AST of the type |Identifier|,
but that |<% f a %>|, |f| and |a| represent ASTs of the type |Expr|.

\caption{Algorithm W.}
\label{fig.intro.algorithmw}
\end{figure}

Unification plays an important role in algorithm W. In this thesis,
we assume the following functions have an efficient implementation~\citep{Dijkstra08efficientfunctional}:
\begin{code}
fresh         ::                 Subst  ->  (Ty, Subst)    -- returns a fresh type var
unify         :: Ty   -> Ty  ->  Subst  ->  (Errs, Subst)  -- unify two types (improves subst)
generalize    :: Env  -> Ty  ->  Subst  ->  Scheme         -- generalize a type in a certain env
instantiate   :: Scheme      ->  Subst  ->  (Ty, Subst)    -- instantiate a scheme freshly
\end{code}
The type |Subst| represents a substitution and a fresh variable supply,
which is a mapping from meta variable to a concrete type.
The types |Ty| and |Scheme| coincide with |tau| and |sigma| respectively.
An environment of type |Env| contains bindings from identifiers to types, and |Errs| is the
type of a collection of error messages.
Similarly, we assume that there is a type |Expr| and |Identifier| that coincide with the
nonterminals |e| and |x| respectively.

Figure~\ref{fig.intro.algorithmw} shows Algorithm W, which is a recursive traversal of
the |Expr| AST. The combinator |local| applies changes to the environment
that are only visible in the monadic computation it
encapsulates. The function |insert| adds a binding into the environment, which
possibly shadows an already existing binding. The |asks| operator exposes the
hidden environment, so that we can use |lookup| to recover the type to which an identifier is bound.

\begin{figure}[tb]
\begin{code}
inferw ^^ <% let x = e in b %> = do
  t  <- ifresh                                 -- start with fresh type for binding
  local (insert x t) (inferw e >>= iunify t)   -- infer type and match against binding
  s  <- igen t                                 -- generalize over unbound variables
  local (insert x s) (inferw b)                -- infer with generalized scheme in env
\end{code}
\caption{Algorithm W with a recursive let-binding.}
\label{fig.intro.algorithmw.rec}
\end{figure}

The above code assumes a non-recursive let binding. Figure~\ref{fig.intro.algorithmw.rec}
shows the algorithm for a recursive let binding.
The unification at the end of inference for |e| binds |t| to its actual type. All
occurrences of |x| in its own right-hand side must agree with |t|. To prevent
having to deal with polymorphic recursion, |x| has a mono-type inside its own
binding. The generalized type is only available in the body of the let-expression.

The positioning of |local| and |igen| is tricky.
Since generalization is performed with respect to free type variables in the environment,
the generalization needs to be positioned outside the local environments of the
|e| and |b| subexpressions, because the type to generalize occurs in these environments.

The monadic formulation of algorithm W is concise because cumbersome flows of
environments and substitutions are encapsulated by the monad. However, the abstraction offered
by the monad is not always obvious when the object language is more complex.
When the object language has pattern-bindings,
new bindings arise when visiting the pattern, which means that behavior for the environment
is more complex than simply top-down.
If multiple derivations are possible, then substitutions may need to be duplicated
and merged. The sequencing of operations on the encapsulated state is therefore
important.

\subsection{Polymorphic Lambda Calculus}
When we eliminate the distinction between mono and poly types, and thus allow poly types
everywhere, we obtain an implicitly typed version of the \emph{System F} type system.
It allows expressions such as
|\f . g (f (\x . x)) (f 3)|, where a lambda parameter is applied to values of different
types, which is not expressible in the DHM type system.
Unfortunately, inference for this system is undecidable~\citep{Wells1999111}, and
it does not have most general types.

For System F itself, type checking is decidable. However, the syntax is verbose, as
type abstraction and type instantiation are explicitly encoded, and types need to
be given explicitly for lambda parameters.
Figure~\ref{fig.intro.systemf} shows the inference rules of System F, and we discuss some
aspects of the rules below.

A type application |f sigma| requires |f| to have a universally quantified type, and provides the
type |sigma| to instantiate this type with. A type abstraction |/\ alpha . e| quantifies the type
of |e| over |alpha|. The syntax of |f| thus dictates when to apply rule \TirName{app.ty} and
rule \TirName{abs.ty}.

\begin{figure}[tb]
\begin{mathpar}
  \fbox{|gam :- e : sigma|}\\
  \inferrule*[right=var]
    {|(x : sigma) `elem` gam|}
    {|gam :- x : sigma|}

  \inferrule*[right=app.e]
    {|gam :- f : sigma1 -> sigma2|\\\\
     |gam :- a : sigma1|}
    {|gam :- f a : sigma2|}

  \inferrule*[right=app.ty]
    {|gam :- f : forall alpha . sigma2|}
    {|gam :- f sigma1 : sigma2 [alpha := sigma1]|}

  \inferrule*[right=abs.e]
    {|gam, x : sigma1 :- e : sigma2|}
    {|gam :- \x . e : sigma1 -> sigma2|}

  \inferrule*[right=abs.ty]
    {|gam :- e : sigma| \\
     |alpha `notelem` ftv gam|
    }
    {|gam :- /\alpha . e : forall alpha . sigma|}
\end{mathpar}
\caption{System F.}
\label{fig.intro.systemf}
\end{figure}

Variants of System F are typically used for typed backends of compilers, because of
its expressiveness. For example, evidence translation of Haskell type classes may need
System F types~\citep{DBLP:journals/jfp/Faxen02}.
As frontend, type systems are proposed that are more restrictive
than System F, but more liberal than the DHM type system. It is desirable that such
systems do not require type annotations for programs that are acceptable by DHM, and
allow for a predictable inference. An inference algorithm then infers a type for such
programs, and maps these to System F terms.

%% These inference algorithms are typically complex, because of various strategies
%% involved to determine the structure of derivations trees, and to determine which meta
%% variables can be mapped to type variables.
%% I don't know myself anymore what I wrote here.

\subsection{Discussion}

The DHM inference algorithm in Section~\rref{sect.intro.dhm} shows that the mapping between
a declarative type system specification
and its accompanying inference algorithm is not straightforward. The gap between a complex type system
and a sophisticated inference algorithm is even larger.
As mentioned above, one of reasons is that an inference algorithm requires information that is not present in
type rules. Also, type derivations are typically only treated as a model. Inference algorithms
compute types from abstract syntax trees, with the underlying assumption that the combination
of type and AST can be turned into a derivation tree.

%% This allows certain judgments to be handled in a different
%% context than the type rules seem to suggest.
Generalization and instantiation, for example, are often dealt with as part of the unification algorithm
in order to support on-demand \emph{impredicative instantiation}, which is the
instantiation of a type variable with a poly type.
In the type rules these are encoded as separate expression rules.

A declarative specification is typically a minimalistic lambda calculus to explain particular language
features, whereas programming languages are much richer in syntax and language features.
Algebraic data types are typically not present in declarative specifications, and neither
are mutually recursive let bindings, because these are often regarded as syntactic sugar.
Actual programming languages, however, require the presence of such features.

We can improve the resemblance between the inference algorithm and declarative specification
when the structure of the inference algorithm can be derived from the declarative
specification. In this thesis, we propose the use of attribute grammars as the basis for
such algorithms. Section~\ref{intro.sect.ags} gives a short introduction.


\section{Background on Attribute Grammars}
\label{intro.sect.ags}

In Section~\ref{intro.sect.typesystems}, we followed the common practice of specifying the
semantics of programming languages via relations between properties, where the relations are
defined by inference rules and the properties include a source term.
\emph{Attribute Grammars} (AGs)~\citep{DBLP:journals/mst/Knuth68} are an alternative
approach. An AG specifies the semantics of a language as attributes on nonterminal
symbols in the grammar of the language.

An AG is a context-free grammar extended with
\emph{attributes} and \emph{rules}.
We give a definition in
Section~\ref{intro.ag.syntax}, in which we describe that
attributes are associated with nonterminals, and rules with productions. Given a value
for each attribute associated with the nonterminal symbols of a production, the rules
specify whether these values are correct.
On the other hand, the rules can also be used to compute such attributions for a given
AST, which we address in Section~\ref{sect.intro.ag.eval}.

In the other subsections we describe common extensions, features and uses of
attribute grammars for the purpose of showing
why AGs are an attractive language for describing the implementation
of compilers, but also to give some background information to which we refer from later
chapters of this thesis. A shallow scan through these subsections may be beneficial to the
understanding of the other chapters of this thesis.

Section~\ref{intro.ag.recent} gives an overview of various systems that provide AGs
in various flavors.

\subsection{Syntax of Context-Free Grammars and Attribute Grammars}
\label{intro.ag.syntax}

%format \/ = "\cup{}"
We introduce a notation for attribute grammars. In this section, we use the term host language
to refer to the language in which the algorithm or compiler is written which we generate from
the attribute grammar description.
As we see later, functions of the host language may appear in grammar descriptions.

We are slightly more formal in this section in
comparison to the other sections because in later chapters we introduce various notations
for attribute grammars and extensions. Since concepts such as nonterminals, productions,
attributes and abstract syntax trees are common to those notations, we introduce these here
--- although we actually expect that the reader is already familiar with these definitions.

\paragraph{Context-free grammars.} Formally, grammars specify languages, but we also use
grammars to describe the structure of tree-like data structures.
\citet{Chomsky56} described
several classes of grammars with increasing expressiveness and implementation complexity.
The class of context-free grammars is particularly convenient for
the description of the structure of terms, but is usually not expressive enough to describe the
desired correctness properties of programs such as well-typedness. With attribute grammars,
context-free grammars are combined with a different formalism to permit the description of such
properties.

\begin{defi}[Context-free grammar]
A \emph{context-free grammar} is a tuple |(V, N, S, P)| where |V| is a set of
terminal symbols (the \emph{alphabet}), |N| is a set of nonterminal symbols,
|S| is the start symbol with |S `elem` N|, and |P| is a set of productions (defined below).
The set |V| and |N| must be disjoint.
\end{defi}

\begin{defi}[Production]
A (context-free) \emph{production} |p = n -> (many m)| is a rewrite rule with nonterminal symbol |n `elem` N|,
and a sequence of symbols |(many m) `elem` V \/ N|. The sequence |many m| may be empty.
The application of |p| to a sequence of symbols |many s `elem` V \/ N|
constitutes to the rewriting of one occurrence of |n| in |many s| to |many m|.
In production |p|, |n| forms the left-hand side of the production and |many m| the right-hand side.
\end{defi}

To summarize, a grammar is a rewriting system where productions specify a rewrite step
from a sequence of symbols to sequence of symbols. In a context-free grammar, a production
specifies how to rewrite a single nonterminal symbol to a sequence of symbols.
The rewriting terminates when only terminal symbols are left:
when successive applications of productions to some singleton sequence |n|
(with |n `elem` N|) results in a sequence of symbols, then this sequence is derived from |n|.

\begin{defi}[Sentence]
A \emph{string} is a sequence of symbols.
A \emph{sentential form} is a string derivable from the start symbol of the grammar.
A \emph{sentence} is a sentential form consisting only of terminal symbols.
\end{defi}

\begin{defi}[Derivation tree]
A derivation tree is a tree |t| that represents how a sentence |many s| is derived from a
symbol |m|, and is inductively defined as follows:
\begin{itemize}
\item
A leaf |t| represents either the derivation of the empty string from a nonterminal symbol
|n| if there exists a production |p = n -> epsilon|, or the trivial derivation of the singleton
string |many s = v| from a terminal symbol |v|. (Only) in the former case, we say that
the leaf is associated with the nonterminal |n| and the production |p|. In the latter case, the
leaf is only associated with the terminal |v|.
\item
If trees |t1,...,(sub t k)| represent the respective
derivations of sentences |many s1,...,many (sub s k)| from symbols |m1,...,(sub m k)| then
the tree |t|, formed by taking |t1,...,(sub t k)| as the respective children of the root, represents
the derivation of the sentence |(many s1) ... many (sub s k)| from symbol |n| if there
exists a production |p = n -> m1 ... (sub m k)|. We say that the root of |t|
is associated with the nonterminal |n| and the production |p|.
\end{itemize}
\end{defi}

\begin{defi}[Syntax tree]
A \emph{syntax tree} (or \emph{parse tree}) is a derivation tree that is associated with the
nonterminal |n|. This definition purposefully excludes singleton trees denoting a terminal symbol.
\end{defi}

\begin{defi}[Abstract syntax tree]
An abstract syntax tree is the result of applying a projection to some syntax tree. Usually, the
resulting tree has less branches (e.g. due to omission of layout) or branches replaced with a more
general representation (e.g. desugared).
\end{defi}

In this thesis, we do not concern ourselves with parsing, which is the process of constructing
a syntax tree that represents the derivation of a given sentence. Instead, we assume the syntax
tree as a given. In fact, we adopt the common convention to abstract over details in the
syntax tree (e.g. whitespace) and work with abstract syntax trees instead.

\begin{defi}[Language]
The \emph{language} |sub L G| specified by a grammar |G| is the set of sentences that can be
derived from the start symbol of |G|.
\end{defi}

If |sub L G| is a programming language, then the source
code of a program written in |sub L G| is a sentence in |L|.
Moreover, if |sub L G| is a language of algebraic data types, then nonterminals describe
type constructors, terminals describe primitive types, and productions describe data constructors.
An AST represents a data structure, and the bit sequence in memory can be regarded as the sentence.


\paragraph{Context-free grammar notation.}
In Figure~\ref{fig.intro.cfg} we introduce a language for the \emph{description} of
context-free grammars: i.e. terms in this language
can be interpreted as a grammer as defined above.
We later extend the language to describe attribute grammars and some AG extensions.
Figure~\ref{fig.intro.stringcfg} shows an example. We explain some aspects of the notation below.

\begin{defi}[Meta grammar]
When we talk about a grammar for a grammar, we call the former a \emph{meta grammar}.
\end{defi}

In the notation, a grammar is the composition of grammars for individual nonterminals. The set
of terminals |V| and nonterminals |N| are left implicit, and productions |P| are grouped per
nonterminal. We reuse these letters for other purposes, such as |N| as an identifier for
a nonterminal, and |P| as an identifier for a production.

\begin{figure}[tb]
%format grammar = "\mathbf{grammar}"
%format prod = "\mathbf{prod}"
%format term = "\mathbf{term}"
%format nonterm = "\mathbf{nonterm}"
\begin{code}
g  ::=  grammar N (many p)    -- the grammar for nonterminal |N|
p  ::=  prod P (many s)       -- production |P|, with as RHS the sequence of symbols |(many s)|
s  ::=  term x :: tau         -- a terminal symbol with name |x| and type |tau|
   |    nonterm x : N         -- a nonterminal symbol with name |x| and nonterminal |N|

N, P, x                       -- names of nonterminals, productions, and symbols
\end{code}
\caption{Language to describe context-free grammars.}
\label{fig.intro.cfg}
\end{figure}

The grammar is abstract: instead of terminal symbols, only the \emph{type} of a terminal
symbol is given.
To stress the difference between terminals and nonterminals, we use a double colon
to specify the type of a terminal and a single colon to specify the name of
a nonterminal.

\begin{defi}[Children]
Each symbol in the right-hand side of a production has an explicit name, which will be
useful later. We call such named symbols the \emph{children} of the production, which
stresses the correspondence to children in the AST of nodes to which the production is
associated.
\end{defi}

In the notation, the nonterminal symbol of the left-hand side of a production is
implicit, since we only describe the right-hand sides of productions.
We give the symbol on the left-hand side of a production the fixed name |lhs|.

\begin{figure}[tb]
\begin{code}
grammar String                           -- a cons-list of characters
  prod Nil                               -- empty list
  prod Cons  term     hd :: Char         -- head of the list (|hd| is a terminal)
             nonterm  tl : String        -- remainder (|tl| is a nonterminal)
\end{code}
\caption{Example of a context-free grammar.}
\label{fig.intro.stringcfg}
\end{figure}

\paragraph{Attribute grammars.} We now extend the above formalism to denote context-free
grammars with notation for attributes and their associated functions.

\begin{defi}[Attribute grammar]
An attribute grammar is a tuple |(T, N, S, A, I, O, P, F)|, where the set of
terminals |T|, set of nonterminals |N|, and start symbol |S| are
defined as for a context-free grammar.
The set |A| consists of attribute names.
The map |I| associates a set of names |sub I n `subset` A| with each nonterminal |n| in |N|,
which make up the \emph{inherited} attributes of |n|.
Similarly, the map |O| associates a set |sub O n `subset` A| with each nonterminal |n| in |N|,
which makes up the \emph{synthesized} attributes of |n|.
For each |n|, the sets |sub I n| and |sub O n| must be disjoined.
The productions |p `elem` P| are redefined below.
The set |F| consists of computable functions |f| which we call \emph{semantic functions}.
%% The mappings |D| associates a set |subsup tau x n| with each |x| in |sub I n| or |sub O n| for
%% each |n `elem` N|, which denote the values an attribute can represent.
\end{defi}

\begin{defi}[Production]
An (attribute-grammar) production |p = u -> (many w) `cdot` (many r) `cdot` X| consists of an annotated
nonterminal symbol |u| and annotated symbols |many w|, rules |many r|, and a set of symbol names
|X|.
\end{defi}

\begin{defi}[Annotated symbol]
An \emph{annotated symbol} is either an annotated terminal or nonterminal symbol.
An \emph{annotated nonterminal symbol} |u = x:n.(many a)| is a combination of
a distinct symbol name |x `elem` X|, a nonterminal symbol |n `elem` N|,
and a collection of attribute names |many a `elem` A| so that |a| is either in |sub I n| or |sub O n|.
We say that |n| is associated to |x|.
An \emph{annotated terminal symbol} |x:v| is a combination of a distinct symbol name |x `elem` X|
and a symbol |v `elem` V|.
\end{defi}

\begin{defi}[Attribute occurrence]
A \emph{reference to an attribute} |x.a| is a combination of a symbol name |x `elem` X|
(associated to some nonterminal |n|) with an attribute name |a `elem` A| so that
either |a `elem` sub I n| or |a `elem` sub O n|.
A \emph{reference to a terminal} |x| is a symbol name |x `elem` X| which is associated
to some terminal |v|.
An \emph{attribute occurrence} |o| is either a reference to an attribute or a reference to a terminal.
\end{defi}

We call an occurrence |x.a| also an attribute |a| of |x|.
Attribute occurrences can be found in rules, which are defined below.

\begin{defi}[Rule]
A rule |(many o1) = f (many o2)| of some production |u -> (many w) `cdot` (many r) `cdot` X|
consists of a semantic function |f `elem` F| and
attribute occurrences |many o1| and |many o2|.

The occurrences |many o1| represent the attributes defined by the rule, which are synthesized attributes
of |u| or inherited attributes of the children |many w|.
The occurrences |many o2| represent the attributes used by the rule, which are the inherited attributes
of |u| or synthesized attributes of the children |many w|. In addition, occurrences in |many o2| may also
refer to terminals.
\end{defi}

Due to the restrictions on occurrences it is always clear whether an occurrence references
an inherited or a synthesized attribute. In our notation for attribute grammars (further below)
we allow the same name to be used for an inherited and a synthesized attribute.

\paragraph{Decorated trees.} To give a semantics to rules, we consider syntax trees
annotated with attributes, and define how attributes of the tree are related to attribute
occurrences, which are mentioned in the rules of productions.

\begin{defi}[Attributes of syntax trees]
An \emph{annotated syntax tree} or \emph{semantic tree} is a syntax tree |T| where in addition
subtrees are associated with the smallest set |Q| defined as follows.
Let |t| be a subtree and |n| be the nonterminal that
is associated to |t|. For each attribute |a| in |sub I n \/ sub O n|, let there be a
distinct attribute symbol |sub q t `elem` Q|. The set |Q| represents the attributes of |T|.
\end{defi}

The symbols |q| can be seen as instances of the attributes, or as occurrences of the attributes in the tree.
This definition states that many trees may be associated with the same nonterminal yet have different
instances of the attributes.
Moreover, if |t| is an annotated syntax tree, then the attributes of each annotated direct subtree of |t| are a
distinct subset of the attributes of |t|.

\begin{defi}[Attribute association]
Given some annotated syntax tree |t| with associated production |p|, an
\emph{attribute association} |alpha| is a mapping so that for each attribute occurrence |o| of |p|,
either |alpha o = q| for some |q `elem` Q| when |o| is a reference to an attribute
(either of |t| or of a direct subtree of |t|), or |alpha o = v| when |o| is a reference to a terminal
child |v| of |t|.
\end{defi}

For each (node of an) annotated syntax tree, there exists such an attribute association.
We leave open how this straightforward connection between attribute occurrences and
attributes of the syntax tree is constructed.

\begin{defi}[Valuation]
A \emph{valuation} |M| is a mapping that associates with each |q `elem` Q| and each |v `elem` V|
a value in the host language, which is denoted as |M q| or |M v|. Furthermore,
|M v = semb0 v| for |v `elem` V| where |semb0 v| is some encoding of |v| as a value in the host language.
These values are called \emph{decorations}.
\end{defi}

\begin{defi}[Attributed syntax tree]
An \emph{attributed} (or \emph{decorated}) syntax tree is an annotated syntax tree combined
with a valuation |M|.
\end{defi}

A rule |(many o1) = f (many o2)| encodes the condition |many (M (alpha o1)) = f (many (M (alpha o2)))|
for each node the rule is associated with.
Alternatively, we may say that |f| functionally defines occurrences |o1| in terms of
occurrences |o2|, and thus that
inherited attributes of children are defined by the parent, whereas synthesized
attributes of the children are defined by the children and may be used by the parent.

\begin{defi}[Correctly attributed syntax tree]
A syntax tree is \emph{correctly attributed} when the conditions imposed by the rules are satisfied.
\end{defi}

\paragraph{Notation for attribute grammars.}
The above definitions introduce concepts that underly attribute grammar languages and
implementations. Figure~\ref{fig.ag.syntax} gives a minimalistic language for the description
of AGs, which is an
extension of the language for context-free grammars in Figure~\ref{fig.intro.cfg}.
We explain some of its aspects below.

\begin{figure}[tb]
\begin{code}
I  ::=  attr N (many i) (many s)     -- attribute declaration for nonterminal |N|
i  ::=  inh y :: tau                 -- declares inherited attribute |y| of type |tau|
s  ::=  syn y :: tau                 -- declares synthesized attribute |y| of type |tau|

S  ::=  sem N (many c)               -- semantics definition for |N|
c  ::=  prod P (many r)              -- semantics in the form of rules for prod |P|
r  ::=  g [many o1] = f [many o2]    -- a rule with pattern |g| and function |f|
o  ::=  t.x.y                        -- attribute occurrence of child |x| and attribute |y|, and kind |t|
t  ::=  ainh | asyn | aloc           -- explicit attribute kind (usually left implicit)

x, y             ^^                  -- names of symbols and attributes
f, g             ^^                  -- expressions
\end{code}
\caption{Minimalistic language for AGs.}
\label{fig.ag.syntax}
\end{figure}

The notation\footnote{
  Note that we reused some letters here which we used before in a
  different context  as the map |I| and the start symbols |S|.}
in Figure~\ref{fig.ag.syntax} consists of a collection of nonterminal
declarations |g|, attribute declarations |I|,
and semantics blocks |S|.
Attributes are declared separately for
each nonterminal and have a type associated with them.
The right-hand side of a rule is an \emph{expression} |f [many o]| in some formal language |H|
with embedded references to attributes at identifier positions of |H| via attribute occurrences |many o|.
The left-hand side of a rule is also an
expression, but limited to patterns such as tuples.
The use of expressions is slightly more flexible than just a function symbol.

\begin{figure}[tb]
\begin{code}
attr String                                       -- declares atributes of |String|
  inh down  :: Char                               -- an inherited attribute |down|
  syn sh    :: String                             -- a synthesized attribute |sh|

sem String                                        -- rules for each production
  prod Nil   lhs.sh   = Nil                       -- rule 1: def. of syn attr of LHS
  prod Cons  tl.down  = loc.hd                    -- rule 2: def. of inh |tl| with term |hd|
             lhs.sh   = Cons lhs.down tl.sh       -- rule 3: of syn attr of LHS
\end{code}
\caption{Example of an AG that shifts a character in a string.}
\label{fig.ag.example}
\end{figure}

For attribute occurrences |t.x.y|, we take the following notational conventions.
The attribute kind |t| distinguishes inherited and synthesized attributes.
This kind is always clear from the context thus we usually leave it unspecified in examples,
unless we want to stress the difference.
To refer to an inherited attribute |y| of a symbol named |x|, we use |ainh.x.y|
or simply |x.y|. To refer to a synthesized attribute |y| of a symbol named |x|, we use |asyn.x.y| or
simply |x.y|.
To refer to a terminal named |x| we use the attribute occurrence |aloc.x.self| or simply |x.self| (see also Section~\ref{intro.sect.incremental}).

Informally, an AG in this notation is well-formed when the description can be translated to
an AG, and that the types of various identifiers and expressions are correct.

Figure~\ref{fig.ag.example} shows an example where we define a transformation
on strings where each character is shifted one position to the right.
We use an inherited attribute |down| to represent the preceding character.
Given a value |'d'| as initial |down| value, the result for |"Ag"| is |"dA"|.
We show a more complex example in Section~\ref{intro.sect.ag.dhm}.

\paragraph{Local attributes.}
We also use the notation |aloc.aloc.x| to refer to a \emph{local attribute} with the
name |x|, which is an attribute defined by a rule of a production and is only in scope
of that production. The name must be distinct from the name of a terminal symbol.
Local attributes are typically used to represent common subexpressions.

\paragraph{Syntactic sugar.}
A terminal |v| can be encoded as a fresh nonterminal (Section~\ref{intro.sect.higherorder})
with a single synthesized attribute and
a single |epsilon|-production that defines the attribute with the value |semb0 v|.
A local attribute can be encoded as a fresh nonterminal with a single inherited attribute and
single synthesized attribute and a single |epsilon|-production that contains a rule that
copies the value of the inherited attribute to the synthesized attribute.
As convenient simplification, we therefore assume in the remainder of this chapter that
terminal-leafs and local attributes are not present in AGs and ASTs, although we use
terminals and local attributes in examples.
In later chapters we take local attributes and terminals into account explicitly.

\paragraph{Interfacing with the AG.}
The evaluation of a tree described by the AG (which we discuss in Section~\ref{sect.intro.ag.eval})
takes as input a record of values for
the inherited attributes and results in a record with values of synthesized attributes.

%if False
\begin{code}
ast   = sem_Cons 'A' $ sem_Cons 'g' $ sem_Nil   -- construct tree |ast|
inhs  = Inh_String { down_Inh_String = 'd' }    -- define inh attrs of the root
syns  = wrap_String ast inhs                    -- evaluate tree, obtain syn attrs
repl  = sh_Syn_String syns                      -- extract syn attr |sh|
\end{code}
The Haskell functions shown in the above example, which we call \emph{wrapper functions}, are
generated from the AG description. The functions |sem_Cons| and |sem_Nil| are constructor
functions for the respective productions. Furthermore, |Inh_String| and |Syn_String| are record
types with a field for each attribute, such as |down_Inh_String| and |sh_Syn_String|, using
a notational convention that guarantees that each field is uniquely labelled.
The function |wrap_String| takes the tree and the record for the inherited attributes, and produces a
record with synthesized attributes.
The use of records is a convenience to interface with the values of attributes independent
of their order of appearance.
%endif

%if False
\subsection{Semantics}

An AG description states that, given a decorated tree, a decorated tree is correct
if the tree is correctly structured according to the context-free grammar and the
tree is correctly decorated. For the latter part, note that each node has an associated
production, thus there is an association between a node and the production's rules, and
an association between a node's decorations and attribute occurrences in a rule.
The tree is correctly decorated if for any rule |(many o1) = f (many o2)| that is
associated with a node in the decorated tree, |f (many o2)| equals |many o1|, where
|many o2| and |many o1| are the associated decorations.

An AG description does not specify how the attributes are computed. This is a
good property, because it provides the opportunity to define various evaluation algorithms
with different properties, and to tune such algorithms, as we will be doing later in this
thesis.

The following operational semantics provides a simple yet inefficient approach to
compute the attributes. Given an AST as initial tree, an evaluator for attribute grammars
nondeterministically selects a node and a rule, and applies the rule if the attributes of the
right-hand side have been computed. Application of the rule decorates the node with attributes
of the left-hand side of the rule. The decorated tree is in normal form when each node is fully
decorated. An AG is well-defined if evaluation can reach a normal form for any tree.

Section~\ref{sect.intro.ag.eval} shows two common AG evaluation algorithms that follow
the above approach in an efficient way.
Other evaluation algorithms are based on either one of these.
%endif

\subsection{Dependency Graphs}
\label{sect.intro.ag.depgraphs}

To describe how the values of attributes are actually computed, we consider the
data dependencies induced by rules. The trees that we consider in this section are
derivation trees generated by some AG |(T, N, S, A, I, O, P, F)|.

\paragraph{Graphs of a production.}
A rule |(many o1) = f (many o2)| represents a data dependency of occurrences |many o1| on
occurrences |many o2|, or equivalently, a flow of data from |many o2| via |f| into |many o1|.
These dependencies form a graph.

\begin{figure}[tb]
\begin{code}
d  ::=  o         -- attribute occurrence vertex
   |    rule x    -- rule vertex with some distinct identifier |x|
   |    child x   -- child vertex with |x| the name of the child

x                 -- identifiers
\end{code}

\caption{Syntax of vertices in a PDG.}
\label{fig.intro.ag.designators}
\end{figure}

\begin{figure}[tb]
\begin{center}
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\footnotesize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, very thick}
  , lab/.style={font=\footnotesize}
  ]

  \node[nd](e){|aloc.aloc.hd|};
  \node[nd, right of=e, xshift=8mm](f){|rule r2|};
  \node[nd, below of=f](a){|ainh.lhs.down|};
  \node[nd, right of=f, xshift=8mm](b){|ainh.tl.down|};
  \node[nd, right of=b, xshift=20mm](c){|asyn.tl.sh|};
  \node[nd, right of=c, xshift=6mm](g){|rule r3|};
  \node[nd, right of=g, xshift=8mm](d){|asyn.lhs.sh|};
  \node[nd, above of=f](h){|child tl|};

  \draw[dep](f) to (e);
  \draw[dep](b) to (f);
  %% \draw[dep2](c) to (b);
  \draw[dep](g) to (c);
  \draw[dep](g) to [in=0,out=-145] (a);
  \draw[dep](c) to [in=0,out=145] (h);
  \draw[dep](d) to (g);
\end{tikzpicture}
\end{center}
\caption{Exemplary PDG of production |Cons|.}
\label{fig.intro.ag.pdg}
\end{figure}

\begin{defi}[Production dependency graph]
A \emph{Production Dependency Graph} (PDG) is a directed graph |(V,E)| associated with some
production |p|. There is a one-on-one mapping between vertices |d `elem` V| (Figure~\ref{fig.intro.ag.designators}), and the
nonterminal children of |p|, the rules of |p| and attribute occurrences in rules of |p|.
The edges |E| consists of:
\begin{itemize}
\item For each vertex |asyn.x.y| an edge to a vertex |child x|.
\item For each rule |[many o1] = f [many o2]| (represented as vertex |rule r|)
      an edge from |rule r| to vertex |o| for each |o `elem` many o2|, and
      an edge from vertex |o| to |rule r| for each |o `elem` many o1|.
\end{itemize}
Similarly, a \emph{production data-flow graph} is a production dependency graph with the edges reversed.
\end{defi}

Figure~\ref{fig.intro.ag.pdg} shows the PDG of the production |Cons| of Figure~\ref{fig.ag.example}.
For simplicity, we modelled the terminal |hd| as a local attribute |aloc.aloc.hd|.

\paragraph{Graph of a tree.}
These graphs can be projected on each node of a tree and then combined to form a dependency
graph for a tree, or a data-flow graph for a tree. The general idea is that we take the PDG of
the root of the tree and then add the graph for each child of the root, which describe the
dependencies of synthesized attributes of the child on inherited attributes of the child.

\begin{defi}[Tree dependency graph]
For some annotated syntax tree |t| with associated production |p|, attribute association |alpha|, and
annotated subtrees |t1,...,sub t k| with corresponding nonterminal children |c1,...,sub c k| of
|p|, the \emph{tree dependency graph} is inductively defined as the
union of the PDGs of |t1,...,sub t k| and the instantiation of
the PDG of |p| by transforming (with preservation of edges)
rule and child vertices to fresh vertices and each occurrence vertex |o| to vertex |alpha o|.
Similarly, a \emph{tree data-flow graph} is a tree dependency graph with the edges reversed.
\end{defi}

Given a tree, evaluation algorithms of AGs (Section~\ref{sect.intro.ag.eval}) are traversals over the
tree dependency graph expressible as tree traversals\footnote{
  The shape of a tree dependency graph ensures that a description with a tree traversal is possible.
  The dependencies are properly nested: on a path from a synthesized attribute of a child to some
  (indirect) dependency, an inherited attribute of the child occurs before any attribute of
  a sibling or parent.
} over the tree that are described by a
nondeterministic tree-walking automaton (Section~\ref{sect.treewalkautomaton}).
Traditionally, such traversals may be demand-driven (traversing the dependency graph
based on which attributes are needed) or may be described by a deterministic tree-walking automaton.

We come back to evaluation algorithms in Section~\ref{sect.intro.ag.eval}.
We first consider static approximations of the
dependency graphs which can be used to prove that none of the attributes have a cyclic
definition, and are an essential ingredient for a description of the evaluation with a
\emph{deterministic} tree-walking automaton.

\paragraph{Approximations.} Below, we consider abstract interpretations of AGs that
construct dependency graphs that are a static approximation of the tree dependency graphs
of collections of trees in certain contexts:

\begin{defi}[Context]
A \emph{context} is a symbol |C| of some fixed set of symbols |Omega|
given per application and grammar.
A context represents an additional set of invariants imposed on a tree.
\end{defi}

Concretely, the invariants represented by a context may include that:
\begin{itemize}
\item The tree is associated with a certain nonterminal or production;
\item The tree occurs as a subtree at certain position of a parent;
\item The tree has attributes that are used according to some protocol~\citep{Farrow:1984:SAG:502949.502881}.
\end{itemize}

A collection of trees in a context share a common structure.
By distinguishing contexts, we may consider projections of the tree dependency graphs of a
collection of trees on the common attributes (of the root) that each tree in the collection has,
so that the edges are superset of the projected edges of each individual tree.
The projection-operation distributes over graph union, which ensures that we can work with
approximations of projections of graphs of subtrees.

\begin{defi}[I/O graph]
An \emph{I/O graph} of some nonterminal |n `elem` N| is directed graph where the vertices
consist of the attributes |a `elem` sub I n \/ sub O n| and the edges represent either
(indirect) dependencies or data flow between attributes of some trees that have |n| as root.
An \emph{I/O dependency graph} is an \emph{I/O graph} where the edges represent data
dependencies, and in an \emph{I/O flow graph} the edges represent data flow.
\end{defi}

Note that the above definition does not specify which edges are included in an I/O graph, but
only specifies what the edges represent. We later give a consistency condition that specifies
which edges must minimally be present.

\begin{defi}[Nonterminal dependency graph]
The \emph{Nonterminal Dependency Graph} (NDG) of nonterminal |n| is the I/O dependency graph of |n|
that approximates the dependencies between attributes of any \emph{tree} associated with |n|.
\end{defi}

An I/O graph of some nonterminal |n| serves as an approximation of the dependencies between
attributes of \emph{some} trees (context dependent) that are associated with |n|.
In contrast, an NDG is a single, context-independent approximation of dependencies between
attributes of some nonterminal.

\begin{defi}[Augmented PDG]
The \emph{augmented PDG} of a PDG |G| of some production |p| and I/O graphs of the
nonterminal children of the production, is |G| with edges added between attributes of
children so that if there exists an edge between attributes of the given I/O graph of some
child, then this edge also exists between the attributes of the child in the augmented PDG,
and vice versa\footnote{We specify here that the dependencies between attributes of a child
in an augmented PDG match exactly to the dependencies between the attributes of the
corresponding I/O graph that the augmented PDG is parameterized with, thus that edges
of the PDG may impose constraints on the I/O graphs of the children.}.
%% In actual implementations,
%% we usually require only that the I/O graph is a subgraph such that an augmented PDG is the
%% instantiation of a PDG with I/O graphs.
\end{defi}

An augmented PDG of some production |p| serves as an approximation of the dependences between
attributes of \emph{some} trees that are associated with |p|. It is a PDG parameterized with
the dependencies induced by the subtrees.

\paragraph{Collections of graphs.}
To specify which edges are part of the I/O graphs and augmented PDGs we consider some
properties of collections of these graphs.

\begin{defi}[Consistent approximations]
A collection of I/O graphs and augmented PDGs is \emph{consistent} when each
I/O graph of some nonterminal |n| in some context $\mathcal{C}$ is a projection
of the union of the augmented PDGs in context $\mathcal{C}$ of the productions associated with |n|.
\end{defi}

\begin{defi}[Complete approximations]
A collection of I/O graphs and augmented PDGs is \emph{complete} when for each production |p|
the collection includes an augmented PDG and corresponding I/O graphs for each combination of
contexts that the children of |p| can occur in.
\end{defi}

\begin{defi}[Smaller approximations]
Graph |A| is \emph{smaller} than graph |B| if |sup A (*)| is a subgraph of |sup B (*)|,
where |sup A (*)| and |sup B (*)| are the respective transitively closed graphs of |A| and |B|.
\end{defi}

\begin{figure}[tb]
\begin{center}
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\footnotesize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, very thick}
  , lab/.style={font=\footnotesize}
  ]

  \node[nd](dn){|ainh.down|};
  \node[nd, right of=dn, xshift=12mm](sh){|asyn.sh|};
  \draw[dep2](sh) to [in=-30,out=-150] (dn);
\end{tikzpicture}
\end{center}
\caption{An I/O graph of nonterminal |String|.}
\label{fig.intro.iograph}
\end{figure}
Figure~\ref{fig.intro.iograph} gives an example of an I/O graph that is part
of some collection that satisfies the above properties based on the AG in Figure~\ref{fig.ag.example}.
It is an I/O graph of the nonterminal |String| in the context of being the child |tl| of
production |Cons|. The dependency from
|asyn.sh| on |ainh.down| is induced by the production |Cons|.
If Figure~\ref{fig.intro.ag.pdg} would include an edge from |asyn.tl.sh| to |ainh.tl.down|,
it would be an augmented PDG parameterized with the I/O graph.

\paragraph{Cycle analysis.}
Cycle analysis of AGs is an abstract interpretation~\citep{Nielson:1999:PPA:555142} that
approximates the tree dependency graph of any tree by constructing a complete and consistent
collection of I/O graphs and augmented PDGs.
The consistency and completeness requirements lead to a set of mutually recursive equations
for which we want to compute a least solution.
Such a solution is obtained with fixpoint iteration.
In Chapter~\tref{chapt.warren} we consider
a concrete cycle analysis; here we look only at the general structure of such analyses.

Various approaches distinguish different contexts which influence the accuracy and complexity
of the approximations:
\begin{description}
\item[Uniform AG.]
   \citet{DBLP:journals/mst/Knuth68} distinguishes as contexts
   the nonterminals to which trees are associated, and the different positions in the
   right-hand sides of productions where nonterminals can occur in.
   This approach\footnote{
     The actual approach does not distinguish different positions but instead instantiates
     augmented PDGs with I/O graphs (thus copies them).
   } leads to one I/O graph per nonterminal and
   one augmented PDG per production.
   The AG is \emph{uniform} or \emph{absolutely non-circular} if the augmented PDGs in
   this flavor are acyclic.
\item[Well-defined AG.] \citet{springerlink:10.1007/BF01702865} additionally distinguishes
   the production to which the tree is associated.
   This approach leads to one I/O graph per nonterminal and production, and an
   exponential number of augmented PDG, one for each combination of children with productions.
   An AG is \emph{well-defined} if for any tree the tree dependency graph is cycle-free, which
   is the case when the augmented PDGs in this flavor are acyclic.
\item[Ordered AG.] \citet{DBLP:journals/acta/Kastens80} distinguishes as contexts
  only the nonterminals to which trees are associated, but does not differentiate the
  occurrences of nonterminals in the right-hand sides of productions.
\end{description}
In our experience with UUAG and in agreement with observations by \citet{DBLP:journals/acta/RaihaS82},
an AG is in practice also ordered when it is well-defined.


\subsection{Tree-Walking Automata}
\label{sect.treewalkautomaton}

Tree-walking automata arose from tree language theory and
were introduced by \citet{Aho:1969:TCF:800169.805425}.
A tree-walking automaton (TWA) is device that walks over a tree in a
contiguous manner and is accompanied by a state machine that describes
how the nodes of the tree change their state upon each visit and whether
the device goes up to the parent or goes down to one of the children as the
next step.
Section~\ref{sect.intro.ag.eval} uses TWAs to describe evaluation algorithms of AGs.
We keep here a simplistic presentation; there exist many extensions that increase the expressiveness
of these automata, such as pebbles~\citep{Engelfriet99tree-walkingpebble}.

\begin{defi}[Tree-walking automaton]
A TWA for some AST is a tuple |(V,Q,I,F,delta)|,
with an alphabet |V| of node labels, a finite set of states |Q|,
an initial state |I `elem` Q|, a set of final states |F|, and a transition relation
|delta `subset` (V `times` Q `times` Q `times` C `times` Q)|, where
|C = { up, sub down 0, ..., sub down k }| is a set of commands and |k| is the maximum branching factor of
nodes in the AST.
There exists a one-to-one relation between productions and symbols in |V| so that
each node is labelled with a |v `elem` V| depending on the production associated with the node.
\end{defi}

A tuple |(v,q,q0,c,q') `elem` delta| represents a transition from a node labelled |v| in
state |q| to a state |q'|, and moving to the node according to |c|. The automaton keeps track
of a bit of history: |q0| is the state of the node that caused the transition to the current node.

\begin{defi}[Deterministic tree-walking automaton]
A \emph{deterministic TWA} is a TWA where the transition relation is
a \emph{function} |delta :: V -> Q -> Q -> (C, Q)|.
Otherwise the TWA is \emph{nondeterministic}.
\end{defi}

\paragraph{Acceptance.}
Initially each node in the AST is associated with the initial state |I|.
The automaton starts at the root of the tree and stops if no
step can be taken anymore. The tree is accepted if the automaton ends with
the root having an associated state in |F|.
With each step, the automaton visits a node.
If the automaton is at a node with label |v| and associated state |q|,
and previously visited a node in state |q0|, then the automaton chooses a step |c| and new state |q'| so that
|(v,q,q0,c,q') `elem` delta|, or the automaton stops if no such step exists.
In the former case, the automaton updates the state of the node to |q'|
and visits the parent if |c=up| or visits child |i| if |c = sub down i|.

\paragraph{Evaluation of rules.}
An actual AG evaluation algorithm does not only traverse the tree but also
needs to apply rules to compute attributes. Thus, in an actual implementation,
the automaton also applies a subset |gamma(v, q)| of the production's rules upon making
a transition to state |q| at a node with label |v|.

\begin{defi}{Visit}
The TWA \emph{visits} a node |n| if it arrives at |n| and executes
|gamma(v, q)| where |v| is the label of |n| and |q| is the state of |n|.
\end{defi}

\paragraph{Implementation with the Zipper.}
Various forms of deterministic TWAs can be implemented in a purely functional
programming language using the \emph{zippers}~\citep{Huet:1997:ZIP:969867.969872}.
Such an approach models the imperative updates of the automaton to the state.

\begin{figure}[tb]
\begin{code}
p  ::=  plans P : N (many v)                 -- execution plan for production |P| of nonterminal |N|
v  ::=  visit C (many i) (many s) (many b)   -- a visit in context |C| that may need |many i| and can produce |many s|
b  ::=  r                                    -- evaluation rule
   |    invoke x C (many i) (many s)         -- a |down x| in context |C|, which can provide |many i| to |x| and expects |many s|

x                                            -- child name
C                                            -- context identifier
i,s,r                                        -- as defined in Figure~\ref{fig.ag.syntax}
\end{code}
\caption{The execution plan language.}
\label{fig.ag.executionplan}
\end{figure}

\begin{figure}[tb]
\begin{code}
plans Nil : String
  visit ^^^ AnyCtx ^^^ inh down :: Char ^^^ syn sh :: String
    lhs.sh   = Nil

plans Cons : String
  visit ^^^ AnyCtx ^^^ inh down :: Char ^^^ syn sh :: String
    tl.down  = loc.hd
    invoke tl AnyCtx ^^^ inh down :: Char ^^^ syn sh :: String
    lhs.sh   = Cons lhs.down tl.sh
\end{code}
\caption{Exemplary execution plans of nonterminal |String|.}
\label{fig.ag.execplanexample}
\end{figure}

\paragraph{Execution plans.}
Figure~\ref{fig.ag.executionplan} introduces a language of execution plans |many p|
for the description of the transition relation of TWAs, of which we explain some aspects below.
Figure~\ref{fig.ag.execplanexample} shows an example.
The language is not expressive enough to describe all transition relations, but it suffices for
a description of an AGs evaluation.

A collection of plans-blocks represents the transition \emph{relation} |delta|.
Since a context is an agreement between parent and child, a context models the |q0| parameter of |delta|.
A plans-block is associated to a unique production |P| and consists of a number of visit-blocks.
A visit-block |v| describes visits to the node in context |C|, and thus
represents a subset |subsup delta (sub P C) v| of |sub delta (sub P C)|.
Let |many r| be the rules that can be evaluated as a consequence of the tree walk taking transitions
from |subsup delta P v|. Then the inherited attributes |many i| of the current node
may be needed in the evaluation of |many r| and the synthesized attributes |many s| can be
computed by |many r|.
An invoke-rule represents possible transitions to some child |x| of |P| in some context |C|
so that values of inherited attributes |many i| of |x| can be provided, and values of synthesized attributes |many s| of |x|
may be needed by the current node.

Given an acyclic PDG of P, the relation |sub delta P| and the accompanying subset of |gamma|
can be generated. This procedure is left as an exercise to the reader; we note that
the number of states is possibly exponential in the size of the productions and
their children, and the order of appearance of rules is irrelevant for the translation.

A collection of plain-blocks may represent also a transition \emph{function} |delta|.
In this case, each visit-block |v| represents a distinct |subsup delta (sub P C) v `subset` sub delta (sub P C)| with
precisely |1+n| elements (of which element |i+1| can be thought of as the continuation after visiting
child number |i|), where |n| is the number of invoke-rules in a visit-block.
The rules must occur in define-before-use order.
The example in Figure~\ref{fig.ag.execplanexample} satisfies these constraints.

\paragraph{Implementations of execution plans.}
In comparison to the Zipper, there are less `imperative' and more efficient encodings of TWAs in functional languages.
If we consider TWAs with transition relations that do not make use of the |q0|
parameter, and thus do not distinguish contexts,
\citet{DBLP:conf/ifip2-4/SwierstraA98} presented an approach by exploiting
lazyness (for nondeterministic TWAs) which we sketch in Section~\ref{sect.intro.ag.eval},
and \citet{Saraiva99} presented visit functions,
which are coroutines encoded as continuations (for deterministic TWAs with a total order imposed
on visits which we sketch in Section~\ref{sect.ag.eval.ordered}).

\paragraph{Implementations described in this thesis.}
In Chapter~\tref{chapt.side-effect} we build upon visit functions.
In Chapter~\tref{chapt.warren} we show that we can represent transition relations that use
contexts.
In later chapters we look at extensions to AGs that have accompanying evaluation
algorithms that cannot be described with a deterministic TWA.
For example, in Chapter~\tref{chapt.iterative} we allow tree walks directed by values of attributes, and in
Chapter~\tref{chapt.breadthfirst} we allow tree walks to jump back and forth saved positions.


\subsection{Demand-driven Attribute Evaluation}
\label{sect.intro.ag.eval}

%if False
%% ON-DEMAND EVAL
On-demand evaluation. Trivial mapping to an execution plan for a nondeterministic
tree-walking automaton. Show that a straightforward mapping to a lazy functional
language is possible by encoding the graph as a cyclic functional program, and
letting the evaluator of the functional language act as tree walking automaton
%endif

We can express demand-driven attribute evaluation by mapping an AG
onto an execution plan that describes a nondeterministic TWA
(Section~\ref{sect.treewalkautomaton}).
This translation is straightforward:
\begin{itemize}
\item From each production an execution plan is derived with a single visit
      that lists all attributes of the productions left-hand side.
\item The visit-block contains the rules of the production, and
      an invoke-rule per child which lists all the attributes of the child.
\end{itemize}
The attributes are then computed by running the described TWA.

An execution plan in the above form has a straightforward translation to algebras
Haskell. We sketch this translation. It is optional background material, but is
not required for the understanding of later chapters.

\paragraph{Catamorphisms.}
Let |F| be an endofunctor so that data constructors describing the AST |A|
form the initial |F|-algebra.
The execution plans can be mapped straightforwardly to an |F|-algebra |phi| so that |cata phi A| is
a function |g| (which we call the \emph{semantic result} or \emph{semantic tree})
that takes an argument for each inherited attribute of the root of |A|, and
provides a result for each synthesized attribute of the root.
Function |g| encodes the tree dependency graph of |ast| (Section~\ref{sect.intro.ag.depgraphs}),
and lazy evaluation acts as nondeterministic TWA, with the additional feature that each
rule in the execution plan is at most executed once.
Effectively, the functional program is a term-graph representation of the dependency
graph, and evaluation, rewriting this term-graph~\citep{DBLP:conf/csl/EekelenSP96}, results in
the values of synthesized attributes.

\paragraph{Haskell translation.}
\citet{DBLP:conf/ifip2-4/SwierstraA98} showed how to express the algebra in Haskell as
a function (called a \emph{semantic function}\footnote{
  Depending on the context, a semantic function may be the function in a productions rule or
  a function corresponding to some data structure in an algebra.
}).
We demonstrate this approach based on the example in Figure~\ref{fig.ag.example}.


Figure~\ref{fig.ag.algebrasketch} gives a sketch of the translation to Haskell code. We explain
some aspects of this example below.

\begin{figure}[tb]
\begin{code}
type T_String = Char -> String                          -- type of the node's semantics

cata_String :: String -> T_String                       -- maps AST to semantic AST
cata_String  Nil           = sem_Nil
cata_String  (Cons hd tl)  = sem_Cons hd (cata_String tl)

sem_Nil :: T_String                                     -- production without children
sem_Nil lhs_down = (lhs_sh) where ...                   -- note that the |T|-type is a function

sem_Cons :: Char -> T_String -> T_String                -- production with two children
sem_Cons loc_hd loc_tl lhs_down = (lhs_sh) where ...
\end{code}
\caption{Sketch of the algebra.}
\label{fig.ag.algebrasketch}
\end{figure}

For each nonterminal |N| a type |T_(brackb N)| for the semantic result of a tree associated
with |N| is generated, which is |T_String| in the example. Nonterminal |String| has
an inherited attribute of type |Char| and a synthesized attribute of type |String|, hence
the type of |T_String| is |Char -> String|.

The |cata|-function associates a semantics with each constructor of the AST. It
replaces a constructor |P| with its semantic variant |sem_(brackb P)|.

\begin{figure}[tb]
\begin{code}
sem_Cons loc_hd loc_tl = \ lhs_down ->
  let  tl_down   = loc_hd                  -- transcription of the first rule
       (tl_sh)   = loc_tl tl_down          -- recursive call to child |tl|
       lhs_sh    = Cons lhs_down tl_sh     -- transcription of the second rule
  in ( lhs_sh )
\end{code}
\caption{The body of |sem_Cons|.}
\label{fig.ag.bodyofcons}
\end{figure}

\paragraph{Semantic functions.}
Figure~\ref{fig.ag.bodyofcons} sketches the generated semantic function |sem_Cons|.
The body of a semantic function encodes the productions rules and the
invoke-rules as given in the execution plan. The encoding of the rules
is straightforward. An invoke-rule with some child |tl| is translated to a
recursive call to the parameter that represents the semantic tree of |tl|.
The call is parameterized with values of the inherited attributes of |tl|, and
a pattern match against the results extracts the values of the synthesized
attribute of |tl|.

\paragraph{Remarks.}
Demand-driven evaluation of AGs is popular in current AG systems.
An advantage of on-demand evaluation is that it does not require
an abstract interpretation as part of its implementation, and works in
combination with extensions such as remote reference attributes~\citep{DBLP:journals/scp/MagnussonH07}.

Another advantage is that attributes are not computed when their values
are not needed at runtime. Demand-driven evaluation may produce values of
attributes even in the presence of cyclic attribute dependencies.

A disadvantage of demand-driven evaluation is the potential high space
requirements also known as space leaks.

The approach based on lazy evaluation goes further than demand-driven
attribute evaluation because the set of required attributes depends on
how the values of attributes are inspected. In fact, using lazy lists
it is possible to associate countable infinite
attributes with a nonterminal.

\subsection{Statically Ordered Attribute Evaluation}
\label{sect.ag.eval.ordered}

%if False
%% GREEDY EVAL
Greedy evaluation. Explain what it is: evaluation described by a deterministic
automaton which may evaluate more than strictly required.
Kastens: deterministic TWA without contexts.
Kennedy-warren: deterministic TWA with contexts.
%endif

We can express a statically ordered evaluation of AGs by mapping the
AG onto a deterministic TWA (Section~\ref{sect.treewalkautomaton}).
Depending on what contexts are distinguished and the strictness
properties of the rules, there may be values computed for attributes
that are not needed for the result, or which are only needed later.
Therefore, we will call this also eager, greedy or strict evaluation
of AGs.

\begin{defi}[Multi-visit AG]
A \emph{multi-visit AG} is an AG for which a statically ordered evaluation
strategy is possible.
\end{defi}

Multi-visit AGs play an important role in this thesis because they make the notion of
phasing explicit, which is useful for reasoning about what parts of the tree have been
investigated and constructed so far.

\paragraph{Attribute scheduling.}
To map an AG to an execution plan of a deterministic TWA, we need to
determine for each production |P| how trees are visited that are associated to |P|.
For each visit, we need to determine which rules to apply and how the children of |P| are visited.
This process requires acyclic augmented PDGs.

\paragraph{Scheduling algorithms.}
\citet{DBLP:journals/acta/Kastens80} presented an approach that attempts to
derive a smallest single sequence |sub Delta N| of visits per nonterminal |N| so that
the attributes of a child with some nonterminal |N| can be computed by visiting
the child according to some prefix of |sub Delta N|.
Effectively, this approach does not distinguish any contexts.
The approach entails adding edges to the I/O graphs of |N| so they are equal
and form a total order on the attributes, which is possible for most
AGs in practice, but the approach of
\citeauthor{DBLP:journals/acta/Kastens80} sometimes needs help in the form of
additional attribute dependencies to accomplish this.

\citet{DBLP:conf/popl/KennedyW76} presented an approach that works for any absolutely
non-circular AG. Their approach distinguishes protocols as contexts, which are the
possible orders in which the
parent provides inherited attributes and demands synthesized attributes.
In Chapter~\tref{chapt.warren} we investigate this approach in-depth and present
a translation to Haskell.

\paragraph{Coroutines.}
A deterministic TWA can be implemented with coroutines~\citep{Warren76}.

\begin{defi}[Coroutine]
A \emph{coroutine}~\citep{DBLP:books/sp/Marlin80} is a function that can pause during its execution
and return results to the caller. It may be parameterized with additional arguments when resumed by the
caller.
A \emph{generator} is a coroutine that does not take additional arguments.
\end{defi}

When no contexts are distinguished, such coroutines can be encoded in a purely functional language
as \emph{visit functions}~\citep{Saraiva99}. The body of such a function builds a continuation that
is used for a subsequent call and returns it as part of the result. From
the caller's perspective:
%format f_visit1
%format f_visit2
%format f_visit3
%format lhs_visit1
%format lhs_visit2
%format lhs_visit3
%format s_child1
%format s_child2
%format asyn1
%format asyn2
%format asyn3
%format asyn4
%format ainh1
%format ainh2
%format ainh3
%format ainh4
%format f_syn1
%format f_syn2
%format f_syn3
%format f_syn4
%format f_inh1
%format f_inh2
%format f_inh3
%format f_inh4
\begin{code}
...
f_visit1 = s_f  -- assuming |s_f| is the coroutine of a child |f|
(f_syn1, f_syn2, f_visit2) = f_visit1 f_inh1 f_inh2
(f_syn3, f_syn4, f_visit3) = f_visit2 f_inh2 f_inh2
...
\end{code}
The callee has the following structure where the dots represent the usual
encoding of the rules that are scheduled to a particular visit:
\begin{code}
sem_(brackb P)_1 s_child1 s_child2 = lhs_visit1 where
  lhs_visit1 ainh1 ... ainh2 = (asyn1, ..., asyn2, lhs_visit2) where
    ...
    lhs_visit2 ainh3 ... ainh4 = (asyn3, ..., asyn4, lhs_visit3) where
      ...
\end{code}
We explain this encoding in great detail throughout this thesis.
Many chapters of this thesis present variants of this encoding.

\paragraph{Remarks.}
Historically, statically ordered attribute evaluation results in faster code and less memory
usage. Also, recent developments on multi-core computing may give renewed
interest in visit sequences with respect to parallel evaluation~\citep{springerlink:10.1007/BF02948394}.

\subsection{Incremental Descriptions}
\label{intro.sect.incremental}

Our formalism allows us to write various declarations that together form an
AG in any order. This is a consequence of the purely functional relation between
attributes. It is an important property of AGs, because it allows us to incrementally
and separately describe AGs. In this thesis, we make repeatedly use of this feature
to eliminate common patterns from examples.
The separate descriptions are simply be merged by
string concatenation.

\paragraph{Incremental notation.}
An AG description is incremental: nonterminals, productions, children, attributes
and rules may be declared separately. At the same time, we may declare productions
and attributes for multiple nonterminals, and children and rules for multiple
productions.

\begin{figure}[tb]
\begin{code}
grammar Expr                             -- productions for |Expr|
  prod App      term  impred  :: Bool    -- with terminal for |App|
  prod Var      term  nm      :: Name    -- with terminal for |App|
  prod App Var  term  uid     :: Int     -- extra terminal for |App| and |Var|

grammar Expr Type                        -- productions for multiple nonterminals
  prod App      nonterm f,a : self       -- multiple nonterminals of the same type

attr Expr       inh x,y :: Int           -- multiple attributes of the same type
attr Expr Type  syn output :: self       -- attributes for multiple nonterminals

sem Expr Type                            -- rules for multiple nonterminals
  prod App      lhs.output = f.output `mkApp` a.output

sem Expr
  prod App Var  lhs.x = 1                -- rule for multiple productions
\end{code}
\caption{Examples of incremental notation.}
\label{fig.ag.incrementalexample}
\end{figure}

Figure~\ref{fig.ag.incrementalexample} gives a number of examples.
A declaration of a nonterminal and production may appear
multiple times and provide additional declarations. Nonterminal |Expr|
has only one production |App| but its contents are determined by
several declarations.

The type |self| is special and represents the type of
the actual nonterminal of the description the |self| appears in.
The function |mkApp| must thus be an overloaded function that works both on |Expr|s and
on |Type|s.

The merging process is straightforward\footnote{
  The interested reader may take a look at @Transform.ag@ in the @uuagc@ project
  for a merge algorithm. This algorithm also allows some declarations to
  overwrite previous declarations.
}. A duplicate declaration of an attribute of the same nonterminal is not
allowed and considered a static error. Similarly, after merging, attributes
may not be defined by more than one rule, and each child of a production must
be defined once.

\paragraph{Nonterminal sets.}
To aid the definition of attributes on many nonterminals, we may use nonterminal sets.
We use nonterminal sets often in actual code but only sporadically in this thesis.

\begin{defi}[Nonterminal set]
A \emph{nonterminal set} is a nonterminal name that represents one or more other nonterminals or sets.
\end{defi}

For example, we define a nonterminal name |AllExpr|, which actually stands for |Expr| and |Decl|.
When we declare attributes on |AllExpr|, these are actually declared for |Expr| and |Decl|:
\begin{code}
set AllExpr : Expr Decl        -- |AllExpr| includes |Expr `union` Decl|.
\end{code}
Nonterminal sets are extensible: a set declaration of some set |N| may appear multiple times in
an AG description. Additionally, notation for set union and set difference may be used to
define sets. Determining sets is a straightforward fixpoint computation.

\paragraph{Nonterminal inheritance.}
A nonterminal may masquerade as a set. If due to a set declaration, such a set includes
other sets and nonterminals, the nonterminal inherits their attributes, productions, and
rules.

\subsection{Higher-Order Children and Attributes}
\label{intro.sect.higherorder}

\begin{defi}[Semantics of a child]
The \emph{semantics of a child} with a nonterminal |N| is a semantic
tree associated to nonterminal |N|. The decorations still have to be given.
\end{defi}

\begin{defi}[Higher-order child]
A \emph{higher-order child} is a child with a semantics determined by the value of an attribute.
\end{defi}

A conventional child is determined by syntax, whereas a higher-order child is determined
by an attribute.
Higher-order children are also known as higher-order attributes or nonterminal attributes.
The notion `higher order' originates from being able to pass
the semantics of children around as first class values.

\emph{Higher-Order} AGs (HOAGs)~\citep{DBLP:conf/pldi/VogtSK89} support higher-order
children. As part of this thesis, we implemented this feature in
UUAG~\citep{uuagc}\footnote{
  The syntax that we use here deviates slightly from the actual syntax of higher-order
  children in UUAG. A type signature |inst.x : N| declares a child |x|, and a
  conventional rule must define the attribute |inst.x| with the semantics for |x|.
  In addition, when |x| already exists, its definition is a function that transforms
  the original semantics of |x|.
}. Higher-order children play an essential role in this thesis: with such children we can
dynamically grow the tree (e.g. a proof tree) instead of being limited to a fixed tree
(e.g. the parse tree).

\paragraph{Children defined by rules.}
In a conventional AG, the semantics of a child of a production is determined prior to
attribute evaluation.
In a Higher-Order AG (HOAG),
additional higher-order children may be declared for a production.
Their semantics is the value of an attribute, or alternatively, the outcome of
evaluating a rule:
\begin{code}
child x : N = f [many a]     -- rule that introduces a child |x|
\end{code}
The expression |f [many a]| evaluates to the semantics for |x| as the following example demonstrates:
\begin{code}
child x : String  = sem_Nil  -- declares a child |x| that is defined by |sem_Nil|
x.down            = 'z'      -- inherited attr of |x|
\end{code}

\paragraph{Implementation.}
We see in later chapters how child-rules can be implemented.
In the translation to Haskell as sketched in Section~\ref{sect.intro.ag.eval},
the semantics of a child is a function from inherited to synthesized attributes,
and each child is translated to a function call.
A child-rule in this section is a conventional evaluation rule that defines some
local attribute, where the local attribute is used as the function to call:
\begin{code}
loc_x     = sem_Nil                -- local attr determines the semantics of |x|
x_down    = 'z'                    -- defines inherited attr of |x|
x_sh      = loc_x x_down           -- call to child |x|
\end{code}

\paragraph{Desugaring.}
HOAGs can be used to desugar an AST.
As example of an HOAG, suppose that |String| has a special production |Single| for
single-character strings.
Instead of defining the semantics directly for |Single|, we add a child
|repl| that represents the string in terms of |Cons| and |Nil|:
\begin{code}
grammar String prod Single  ^^^  term x :: Char    -- the |Single| production
sem String prod Single                             -- and its semantics
  child repl : String  = sem_Cons loc.x sem_Nil    -- higher-order child
  repl.down            = lhs.down                  -- inh attr of child |repl|
  lhs.sh               = repl.sh                   -- syn attr of child |repl|
\end{code}

\paragraph{Multi-visit AGs as HOAGs.}
%format P1
%format N1
In a multi-visit AG, a child may be visited multiple times to compute some of the
attributes. Such an AG can be encoded as a HOAG, which we show below.
As we see in later chapters, we worked on a core language that can represent
such AGs, and the question arose whether to use HOAGs as a target language.
We did not do this because of other requirements, but we present the translation
anyway since it may give some insight in how we organized the visit functions
earlier.

We may encode multiple visits to some child |c| as
a single visit to child |c| that only requires the
inherited attributes of the first visit and only provides the synthesized attributes
of the first visit. Additionally, it produces an
attribute |c.cont| that represents the semantics of |c| after the visit. For the
second visit, we use a higher-order child |c2| with |c.cont| as semantics.
We then visit |c2| to provide/obtain the attributes of the second visit. For the
next visit we use |c2.cont|, etc.
This approach requires the introduction of a potential large number of new nonterminals.

For a nonterminal |N| with |m| visits,
we introduce the nonterminals |N1, ..., (sub N m)|, such that each
nonterminal |(sub N i)| has the inherited and synthesized attributes as associated to visit |i|
of nonterminal |N|.
In addition, |(sub N i)| (for |i < m|), has an extra synthesized \emph{continuation}
attribute |cont| that contains the semantics of |(sub N (i+1))|.

For a production |P| we introduce the productions |P1, ..., (sub P m)|. The terminals of
production |sub P i| represent the decorations as available prior to visit |i|. Thus,
|P1| consists of the original terminals and nonterminals of |P|, and
|(sub P i)| (for |i > 1|) consists of the terminals of |sub P (i-1)| and additionally has
terminals which encode the attributes computed in visit |i|. As optimization the terminals
that are not needed in later visits can be omitted from |sub P i|.

The semantics for |(sub P i)| consists of the rules in |P|'s plan for visit |i|
(modulo renaming of attribute references). Additionally,
a rule is added which computes the semantics of |sub P (i+1)| and stores it in attribute |lhs.cont|.

In this translation, a visit to a child (for |i > 1|) is thus represented as a higher-order child
that is instantiated by the continuation attribute produced by the previous visit.

\subsection{Circular Reference Attributes}
\label{intro.sect.referenceattrs}

Reference Attributed AGs (RAGs) are an extension of attribute grammars with (remote) reference
attributes~\citep{DBLP:journals/scp/MagnussonH07}. This is a common extension of
AGs that utilize a demand-driven evaluation algorithm. The extension allows subtrees to be passed around
in attributes. Normally, the attributes of such a subtree |T| can be used and defined only by the
direct parent of |T|. However, with the extension, the attributes of |T| may be used
and defined by any rule holding a reference to |T|. Such an attribute is said to be
accessed \emph{remotely}.

\paragraph{Graph structure.}
With this extension, the nodes are organized in
a graph structure instead of a tree. Calculations over graphs often require some form of
repetition, which can be encoded with cyclic attributes. Values of cyclic attributes can
be computed if the demand-driven evaluation is extended with fixpoint iteration, and the
attributes are given an initial value. The computation terminates if the rules are monotonic
and each ascending chain of attribute values stabilizes.

\paragraph{Advantages and disadvantages.}
Reference attributes provide a convenient way of transporting information from one location in
a tree to another location. Also, the extension allows more analyses to be modelled with AGs,
such as abstract interpretations, which are typically fixpoint iterations over graphs.
This expressive power comes with a price: the well-definedness of an AG cannot be statically
verified in general (Section~\ref{sect.intro.ag.depgraphs}).
Moreover, if values of \emph{inherited} attributes can be defined remotely, well-formedness of
the AG cannot be checked statically, which has as consequence that a straightforward mapping
(such as in Section~\ref{sect.intro.ag.eval}) to a purely functional language is not possible\footnote{
  We implemented synthesized reference attributes in UUAG for lazily evaluated grammars.
}.


\subsection{Correspondences between AGs, HOAGs, Monads, and Arrows}
\label{intro.sect.correspondences}

%if False
In a functional language, we may represent AGs as functions from inherited attributes to
synthesized attributes in the target language. The attributed AST is
implicitly stored on the call stack and in function closures.
Alternatively, we obtain more explicit representations by
defunctionalizing~\citep{DBLP:journals/lisp/Reynolds98a} the functions.
%endif

In later chapters, we translate AGs to a monadic target language~\citep{meijerjeuring95}
and also consider AGs translated to Arrows~\citep{DBLP:conf/afp/Hughes04}.
Being able to structure a computation as a monad or arrow allows
reflection on the structure of the computation. We use such introspection in
Chapter~\rref{chapt.breadthfirst} to implement a step-wise evaluation strategy.

For example, consider the code of production |Cons| of the example in Section~\ref{intro.ag.syntax}.
In its present formulation, it can be evaluated in a strict fashion:
\begin{code}
sem_Cons loc_hd loc_tl lhs_down = (lhs_sh) where
  tl_down  = loc_hd
  (tl_sh)  = loc_tl tl_down
  lhs_sh   = Cons lhs_down tl_sh
\end{code}
We rewrite this code using arrow notation~\citep{DBLP:conf/icfp/Paterson01} and call the result an execution plan:
\begin{code}
sem_Cons field_hd loc_tl  = proc lhs_down -> do
  tl_down  <- (sub f copy)    -< field_hd            -- transcription of the first rule
  tl_sh    <- loc_tl          -< tl_down             -- invoke arrow of child |tl|
  lhs_sh   <- (sub f cons)    -< (lhs_down, tl_sh)   -- transcription of the second rule
  returnA  -< lhs_sh                                 -- output
  where  (sub f copy)  = id
         (sub f cons)  = uncurry Cons
\end{code}
The translation into arrow notation essentially desugars the above code into
point-free style, thus in a linear composition of
the rule functions that is interspersed with combinators to rearrange the intermediate
attribute values.

Can we represent any AG in this way?
When we limit the expressiveness of the language of the rules to tree constructions
only, conventional AGs can express primitive recursive functions whereas HOAGs can
express all computable functions.
This difference can be observed when translating to arrows.
In this setting, the semantics of a tree is an arrow that takes a tuple of inherited
attributes as input and produces a tuple of synthesized attributes as output.
A conventional AG is expressible as a plain arrow, but a HOAG requires the
generalization to a monad, and AGs that are not statically ordered may require a feedback-loop.

Introspection of the arrow is possible using defunctionalization~\citep{DBLP:journals/lisp/Reynolds98a}.
Such an arrow actually encodes the tree dependency graph
(Section~\ref{sect.intro.ag.depgraphs}), thus introspection on this structure allows a runtime
optimization (e.g. elimination of identity functions) of this graph which may be useful when
traversing parts of the graph several times (Chapter~\rref{chapt.iterative}).

%if False
Suppose that we iterate the evaluation of such attributes for a node
to compute a fixpoint (Chapter~\rref{chapt.iterative}).
We want to dynamically fuse rules to improve the performance of each iteration.
For example, the identity function is a unit to composition and can be eliminated.
An AG is likely to have many rules with identity functions (Section~\ref{intro.sect.copyrules}).
We can accomplish this by allowing introspection of the node's computation.

To allow reflection on the computation, we represent the arrow as a first-class value with
typed abstract syntax and GADTs.
The arrow combinators are expressible in terms of the following
constructors |Arr|, |Comp| and |First|:
\begin{code}
data Sem   :: * -> * ^^ where                -- child's semantics as arrow
  Arr      :: Rule i s -> Sem i s            -- embed rule
  Comp     :: Sem i z -> Sem z s -> Sem i s  -- compose two arrows
  First    :: Sem i s -> Sem (i,z) (s,z)     -- bypass some intermediate attrs |z|

data Rule  :: * -> * ^^ where                -- reflection on rules
  Fcopy    :: Rule a a                       -- identity rule |sub f copy|
  Fswap    :: Rule (a,b) (b,a)               -- swap rule
  Fdup     :: Rule a (a,a)                   -- clone rule
  Fcons    :: Rule (Char,String) String      -- the |sub f cons| rule
\end{code}
The following evaluation function defines the expected
meaning of the reflected rules:
\begin{code}
evalRule :: Rule a b -> a -> b
evalRule Fcopy  = id                         -- |sub f copy|
evalRule Fswap  = \ (a,b) -> (b,a)           -- needed for combinator |second|
evalRule Fdup   = \a -> (a,a)                -- needed for combinator |&&&|
evalRule Fcons  = \ (c,cs) -> Cons c cs      -- |sub f cons|
\end{code}
These constructors are essentially an index to the actual code of the rule.

We can use knowledge about compositions of rules to replace such
compositions with more efficient implementations.
As a simple example, we eliminate left-skewed copy chains from such an
arrow, and replace nested
|First|s with a more efficient variant |First2|:
%format First2
\begin{code}
optimize :: Sem i s -> Sem i s
optimize (Comp (Arr Fcopy) g)  = optimize g
optimize (Comp f g)            = Comp (optimize f) (optimize g)
optimize (First (First t))     = First2 (optimize t)
optimize (First t)             = optimize t
optimize a                     = a
\end{code}
The interpretation of the arrow combinators is the standard arrow instance for functions:
\begin{code}
evalTree :: Sem i s -> i -> s
evalTree (Arr r)     = evalRule r
evalTree (Comp f g)  = evalTree g . evalTree f
evalTree (First t)   = \ (a,c)   -> (evalTree t a, c)
evalTree (First2 t)  = \ (a,cd)  -> (evalTree t a, cd)
\end{code}
Note that the arrow approach has overhead related to the management of the intermediate
state that connects the rules. More advanced rule fusion can reduce this overhead.
For example, we implemented some flow analyses on ActionScript bytecode\footnote{
  Source code of the AG-based ActionScript bytecode manipulation suite: \url{https://svn.science.uu.nl/repos/staff.ariem.fittest-uu/tools/asic/}}. For one analysis, we essentially added a delta to a chained integer attribute.
The byte code consists typically of long sequences of instructions with only little
branching. Such sequences can be compressed by adding up the deltas.
%endif

\subsection{Specification of Typing Relations}
\label{sect.intro.agtypespec}

In the preceding sections, we discussed several AG features. We will now make the connection
with type systems.
A reason why attribute grammars are a convenient formalism to describe type systems is that
we can see a specification with type rules as an attribute grammar where the typing relations are
attributed nonterminals, the type rules form the productions, and the judgments form the
symbols with equations between the attributes. The AG thus describes the structure of
derivation trees.

To map an AG to type rules, we need to translate nonterminals, productions and rules.
A nonterminal |N| with inherited attributes |sub I N| and synthesized attributes |sub O n| translates
to a typing relation of the form |(sub I N) (annotTurnstyle N) (sub O n)|. Note that the distinction between
inherited and synthesized gets lost in the translation, although we use the notational
convention that supposed inputs are to the left of the turn style and supposed outputs are to the
right. Without going in details, a production
|p = u -> (many w) `cdot` (many r) `cdot` X| translates to a type rule
with a judgments for each symbol in |many w|, equality judgments for each rule in |many r|,
and a conclusion for |u|.
The actual translation is quite similar to the translation to a lazy functional program, which
we already showed in Section~\ref{sect.intro.ag.eval}.

If we take the inverse of the above translation, we arrive at a translation from type rules
to AGs. This process, however, is non-trivial, which is not surprising because there exists
no general inference algorithm. Thus, to do so, we need to identify which attributes are inherited
and which are synthesized, and with what computable function to represent equality judgments.
However, as we come back to in Section~\ref{intro.sect.globalpicture}, we can use such an
improper AG as a starting point for an inference algorithm, and add information to make it
a proper AG. Such an AG then has the desirable property that its basis directly corresponds to the
specification.

In the next sections, we give an AG implementation of a DHM type inferencer, and show some
attribute grammar features that are useful when using AGs to describe derivation trees.


\subsection{Damas-Hindley-Milner Inference}
\label{intro.sect.ag.dhm}

As an example of a type inference algorithm written with attribute grammars,
we give an AG implementation of the DHM algorithm (Section~\ref{sect.intro.dhm})
by using the same approach as presented by \citet{DBLP:conf/afp/DijkstraS04}.

\begin{defi}[Chained attribute]
A \emph{threaded attribute} or \emph{chained attribute} stands for both an
inherited and a synthesized attribute with the same name.
\end{defi}

The environment is modeled as an inherited attribute, errors as a synthesized
attribute, and the substitution as a chained attribute.

\begin{figure}[tbp]
%format aenv = "\Varid{env}"
%format aty  = "\Varid{ty}"
%format asubst = "\Varid{subst}"
%format ainh = "\Varid{inh}"
%format asyn = "\Varid{syn}"
\begin{code}
grammar Expr                              -- abstract grammar for expressions
  prod Var      term     x     :: Name    -- the identifier |x|
  prod App      nonterm  f     : Expr     -- left-hand side of application
                nonterm  a     : Expr     -- right-hand side of application
  prod Lam      term     x     :: Name    -- the identifier |x| bound by the lambda
                nonterm  b     : Expr     -- the body of the lambda
  prod Let      term     x     : Expr     -- the name of the binding
                nonterm  e, b  : Expr     -- the binding and body

attr Expr  ^^^  inh aenv    :: Env        -- inherited environment
                chn asubst  :: Subst      -- chained attr: both inh and syn
                syn aty     :: Ty         -- synthesized type
                syn errs    :: Errs       -- collection of type errors
\end{code}
\caption{DHM grammar and attributes.}
\label{fig.intro.dhmgramattr}
\end{figure}

For the type we have two options.
Either the parent passes an expected type to the child that is further
constrained by the child, or the child passes up an inferred type that is
further constrained by the parent.
When type annotations are part of the language, the former approach detects
type errors faster. The information from these type annotations can then be given
to a child, instead of verifying the resulting type after the fact.
However, for this example it does not matter, thus we take the latter approach,
which we also used for the monadic implementation of DHM in Section~\rref{sect.intro.dhm}.

Figure~\ref{fig.intro.dhmgramattr} shows the grammar and attributes of the example.
The chained attribute |asubst| is shorthand for an inherited and synthesized
with both |asubst| as name. To make it clear which of the two attributes we
intend, we explicitly prepend |asyn| and |ainh| to the name in the rules.

\begin{figure}[tbp]
\begin{code}
sem Expr
  prod Var    loc.scheme                   = lookup loc.x lhs.aenv
              (lhs.aty, asyn.lhs.asubst)   = instantiate loc.scheme ainh.lhs.asubst
              lhs.errs                     = emptyset

  prod App    a.aenv                       = lhs.aenv
              f.aenv                       = lhs.aenv
              (loc.res, ainh.a.asubst)     = fresh ainh.lhs.asubst
              ainh.f.asubst                = asyn.a.asubst
              (loc.errs, asyn.lhs.asubst)  = unify f.aty (a.aty -> loc.res) asyn.f.asubst
              lhs.errs                     = f.errs ++ a.errs ++ loc.errs

  prod Lam    b.aenv                       = insert loc.x loc.argty lhs.aenv
              (loc.argty, ainh.b.asubst)   = fresh ainh.lhs.asubst
              lhs.aty                      = loc.argty -> b.aty
              asyn.lhs.asubst              = asyn.b.asubst
              lhs.errs                     = b.errs

  prod Let    e.aenv                       = lhs.aenv
              b.aenv                       = insert loc.x loc.scheme lhs.aenv
              loc.scheme                   = generalize lhs.aenv e.aty asyn.e.asubst
              ainh.e.asubst                = ainh.lhs.asubst
              ainh.b.asubst                = asyn.e.asubst
              asyn.lhs.asubst              = asyn.b.asubst
              lhs.errs                     = e.errs ++ b.errs
\end{code}
\caption{DHM with AG rules.}
\label{fig.ag.dhmimpl}
\end{figure}

The rules in Figure~\ref{fig.ag.dhmimpl} describe how the environment is passed top-down through the tree,
how the substitution is threaded in-order through the tree,
and how errors are collected bottom-up. Since functions such as |instantiate| and
|unify| produce an updated substitution, it is the threading of the
substitution that determines in what order the effects of these operations
are visible in the substitution. The substitution needs to be threaded carefully
in order not to loose any constraints on type variables.

In contrast to the monadic approach in Section~\ref{sect.intro.dhm}, the rules
are compositional.
The distribution of the environment, the threading
of the substitution, and the collection of error messages can be described
separately and relatively independently.
On the other hand, the rules are also more verbose because
the environment, substitution and error messages
are not hidden.
Furthermore, rules that employ |generalize| are crosscutting as they
deal with types, substitutions and environments. This has a negative
effect on the degree of separation in the descriptions of these individual attributes.
We address both issues in Section~\ref{intro.sect.copyrules}.

\subsection{Copy Rules and Collection Attributes}
\label{intro.sect.copyrules}

We often pass values in standard top-down, bottom-up, and in-order patterns between
attributes of the tree.
The rules that encode these patterns are trivial: essentially
identity-functions between attributes. For example, to pass an environment topdown to
the children of an application we use the following rules:
\begin{code}
sem Expr prod App
  ainh.left.env   = id ainh.lhs.env     -- copy down left
  ainh.right.env  = id ainh.lhs.env     -- copy down right
\end{code}
To thread\footnote{
When we write that we thread an attribute |x| through the tree, then we
actually mean that we thread a value through the tree via a sequence
of attributes that are all named |x|, and that are chained together by
(mostly) copy rules. In a similar way, we talk about passing attributes topdown
and bottom up.} a counter |uid| (unique identifier) through the tree, we use the following rules:
\begin{code}
sem Expr prod App
  ainh.left.uid   = id ainh.lhs.uid         -- copy down to left
  ainh.right.uid  = id asyn.left.uid        -- from left to right
  asyn.lhs.uid    = id asyn.right.uid       -- copy up from right
\end{code}
Similarly, if a production has one child, we may pass the value of
an attribute of that child bottom-up as value for the same attribute
of the parent:
\begin{code}
sem Expr prod Lam ^^^  asyn.lhs.errs = asyn.b.errs  -- copy error messages up
\end{code}
Such \emph{copy rules}~\citep{10.1109/SCAM.2007.13} are so common\footnote{
  The AGs of UHC have more than twice as many copy rules than explicitly written rules.
  Thus the inference of copy rules saves a lot of manual labor.
} that we allow these rules to be omitted.

To make such an AG well-formed, the following algorithm augments
an AG with copy rules.
If a rule is missing for an inherited attribute |a| of a child |c|,
we insert a rule that takes an attribute with the same name |a| from
the local attributes, or synthesized attributes of the children to the
left |c|, or the inherited attributes of the parent. The last attribute occurrence
is taken in the ordering: inh from parent, syn of children,
local attributes, inherited attrs of parent.
In a similar way, we treat omitted copy rules for synthesized attributes.
These copy rules can be considered to provide generic behavior for AGs that is
not unlike the abstraction offered by reader and state monads.

Furthermore, we often collect attribute values in a bottom-up fashion:
\begin{code}
sem Expr prod App    asyn.lhs.errs  = asyn.left.errs ++ asyn.right.errs
sem Expr prod Const  asyn.lhs.errs  = []
\end{code}
Such \emph{collection rules}~\citep{10.1109/SCAM.2007.13} can also be
inferred automatically when we specify a combination operator and an
initial value:
\begin{code}
attr Expr ^^^ syn errs use (++) []
\end{code}
This approach captures the abstraction provided by writer monads.

\paragraph{Intermediate nodes and copy rules.}
It is often convenient to have to have intermediate nodes in the AST structure.
An important benefit of copy rules is that it allows us to transparently add
intermediate nodes to the AST.

As an example, consider function application in the lambda calculus.
It is usually expressed as a binary expression in the abstract syntax:
\begin{code}
grammar Expr ^^^  prod App  ^^^  nonterm f,a : Expr
\end{code}
A function call with multiple arguments (e.g. |f a1 a2|) is thus encoded as a
sequence of applications (|(f a1) a2|) using |App|. Suppose that we want to add an inherited
attribute that describes at which argument position an expression occurs:
\begin{code}
attr Expr ^^^ inh index :: Int
sem Expr ^^^ prod App     ^^^  ainh.f.index = ainh.lhs.index
                          ^^^  ainh.a.index = 1 + ainh.lhs.index
\end{code}
This definition is incorrect for nested function calls
(eg. |f a1 (g a2)|), because the first argument |a2| of the nested call receives
|2| as value for its |index| attribute. We get the expected behavior by distinguishing
whether an expression occurs to the left or right of an application with an
inherited attribute. This, however, requires us to specify this attribute for each
occurrence of an expression nonterminal.

We obtain a more concise solution if we assume there
is always a special top node above a sequence of applications:
\begin{code}
grammar Expr ^^^ prod AppTop  ^^^  nonterm e : Expr
sem Expr ^^^ prod AppTop      ^^^  ainh.e.index = 0
\end{code}
The copy rules transparently connect the remaining attributes of |e| with
attributes of |lhs|. For these attributes, the existence of the intermediate
node is not visible. A typical place to add these intermediate
nodes is in the parser or with a tree transformation.

If we furthermore ensure that a special root node occurs above expression
trees, then we also easily define an initial value for the |index| attribute:
\begin{code}
grammar ExprTop ^^^ prod Top  ^^^  nonterm e : Expr
sem ExprTop ^^^ prod Top      ^^^  ainh.e.index = 0
\end{code}
With nonterminal sets (Section~\ref{intro.sect.incremental}) we can define attributes that are common to |Expr| and
|ExprTop| without code duplication.

\paragraph{Higher-order children and copy rules.}
\label{sect.intro.hochildrencopyrules}
To improve the effectiveness of copy rules further, and in general to improve
the separation of concerns, we can encode crosscutting rules as
higher-order children (Section~\ref{intro.sect.higherorder}).
By encoding a rule |r| as a child, we abstract |r| from how it is combined with
other rules.

This idea is the spirit of this thesis. A higher-order child can be used
to declaratively specify tasks (for instance, to ensure that two terms are equal),
and the underlying implementation (the unification function)
reflects the effects of performing the task in terms of attribute values.
The orchestration of
these tasks is determined by how the rules weave the attributes together.

\begin{figure}[tb]
%format aty1
%format aty2
\begin{code}
grammar Unify prod Unify                   -- nonterminal that represents unification

attr Unify ^^^  inh  aty1,aty2  :: Ty      -- the two input types
                inh  aenv       :: Env     -- the input environment
                chn  asubst     :: Subst   -- the input/output substitution
                syn  errs       :: Errs    -- the output error messages

sem Unify prod Unify                       -- essentially a wrapper around |unify|
  (lhs.errs, asyn.lhs.asubst) = unify lhs.aenv lhs.aty1 lhs.aty2 ainh.lhs.asubst
\end{code}
\caption{Encoding of |unify| as a higher-order child.}
\label{fig.ag.higherorderunify}
\end{figure}

In general, we can represent any function as a higher-order child.
In Figure~\ref{fig.ag.higherorderunify} we introduce a nonterminal for unification that has inherited
attributes for each input of unification and synthesized attributes for
each output of unification. The nonterminal has only one production, which
contains only one rule, which is the rule we abstract over.

\begin{figure}[tb]
\begin{code}
sem Expr prod App
  child u : Unify  = sem_Unify                    -- higher-order child

  u.aty1           = f.aty                        -- input type
  u.aty2           = (a.aty -> loc.res)           -- input type

  ainh.u.asubst    = asyn.a.asubst                -- copy rule
  asyn.lhs.asubst  = asyn.u.asubst                -- copy rule

  u.aenv           = lhs.aenv                     -- copy rule
  lhs.errs         = f.errs ++ a.errs ++ u.errs   -- copy rule
\end{code}
\caption{The DHM rules for |App| with copy rules.}
\label{fig.ag.dhmcopyrules}
\end{figure}

In a similar way we can encode the |fresh| and |instantiate| functions
as higher-order children (Section~\tref{visit.sect.typingexpressions}).
In Figure~\ref{fig.ag.dhmcopyrules} we show how this is done for
production |App|, where we added the unify-nonterminal as higher-order child |u| to the production, and
added rules for the attributes of |u|.
The latter rules are all implied by the copy rule mechanism and
can be omitted.

\paragraph{Remarks.}
The automatic completion of an AG with copy rules is a double-edged sword.
The mechanism saves a lot of boilerplate code, but the automatic behavior
may not always be intended.
If we accidentally forget to define an attribute explicitly, and
that attribute can be given a default definition via a copy rule, then
we are not warned that a rule is omitted.
We provide a way to specify copy rules per production in
Chapter~\rref{chapt.iterative}, which gives more control over where
and what copy rules are applied.

The copy rule mechanism uses the order of
appearance of children for threading and bottom-up collections.
This is sometimes not the appropriate order for a given application.
For example, higher-order children follow conventional children in the order
of appearance, thus are always at the end of copy rule chains.
In such situations we can override the
copy rule behavior by giving explicit rules for attributes.
However, this reduces the convenience of copy rules, and the
underlying idea that we rather specify patterns than
individual rules.

As a solution, in Chapter~\rref{chapt.side-effect} we allow
the visit order to children to be explicitly specified
for eagerly evaluated AGs, and can then use copy rules that
use the visit-order instead of the order of appearance.
Moreover, in Chapter~\rref{chapt.warren} we define
commutable (copy) rules, which are rules that
can be ordered independently of their value dependencies.


\subsection{Advantages and Disadvantages}

The greatest advantage offered by AGs is that specifications are composable.
Productions, attributes and rules can all be specified separately and automatically combined
into a monolithic specification (Section~\ref{intro.sect.incremental}).
This easily allows new attributes and behavior to be added and shared with already existing
rules. In particular, extra attributes can be used to specify additional administration for
type inference strategies, and for the specification of what a compiler does with the inferred
types.

In general, AGs offer modularity~\citep{143210} and extensibility~\citep{DBLP:conf/icfp/VieraSS09},
which can be realized via generic programming and meta programming approaches.
In this thesis, we make use of such facilities, although these facilities themselves are not the focus of
this thesis. Instead, we focus on combining AGs with algorithms that implement the functionality
specified by declarative aspects of type rules.


\section{Background on Ruler}
\label{intro.sect.ruler}

This thesis complements previous work by \citet{DBLP:conf/flops/DijkstraS06} on the
Ruler language and tool suite. Ruler gives a semantics to type rule descriptions
in the form of an implementation with conventional attribute grammars and Haskell.
Consequently, Ruler in its current state provides only an implementation for
syntax-directed type rules.

The purpose of this thesis is to provide a core language \emph{RulerCore}
for Ruler which allows more complex inference strategies to be described.
RulerCore extends on attribute grammars in various ways.
Therefore, we focus more on attribute grammars than on type rule
descriptions, although type systems play a prominent role in our work.
The work on Ruler provides a notation for type rules and show how this notation
translates to attribute grammars. From this work a notation that maps
to RulerCore can be designed, hence we describe Ruler in this section.

\subsection{Ruler Features}

Ruler aims to generate both a type system specification and a type
inference implementation from a single description of the type
system with type rules. The generated specification consists of
type rules formatted to \LaTeX{} figures, and can be used as documentation
and for formal reasoning. The generated implementation consists of
attribute grammars that can be included verbatim in the source code of a
compiler. This approach guarantees consistency between the specification
and the implementation.

Ruler provides notation for the incremental description of type
rules. It features the addition of parameters to judgements, type rules to
relations, and judgements to type rules in a similar way
as we can add attributes, productions, and rules to an AG description.

When type rules are declarative, only well-formedness checks and generation
of the specification is possible.
However, Ruler provides also notation to describe algorithmic rules.
Using the facilities for incremental descriptions,
a direction can be given to the parameters of typing relations, and type
rules can be associated with productions of an accompanying attribute
grammar, which turns the typing relation in a deterministic function
for which AG code can be generated.

\subsection{Ruler Concepts}

We take the explicitly typed lambda calculus as described in
Section~\ref{intro.sect.typerules} as basis to show the main
concepts and syntax\footnote{
  We took the freedom to deviate
  slightly from Ruler's actual syntax in order to have closer correspondence
  with the notation that we use in this thesis.
} of Ruler.
In Ruler, a type system description is a composition of views
on relations and their rules.

\begin{defi}[View]
In Ruler, a \emph{view} is a named subset of the declared
relations, holes, judgments, rules, etc. Each view
describes a type system.
\end{defi}

We start with a declarative view,
which we give the name |D|. Later, we provide also an algorithmic
view with the name |A|.

\begin{figure}[tb]
\begin{code}
  relation expr view D                                   ^^  -- declares the typing relation |expr| for |D|
    holes      g :: Gam   ^^^  e :: Expr    ^^^ t :: Ty  ^^  -- specifies parameters of |expr|
    judgespec  g :- e : t                                    -- notation for judgments of |expr|

  relation member view D                                 ^^  -- declares the relation |member|
    holes      g :: Gam   ^^^  x :: String  ^^^ t :: Ty  ^^  -- specifies parameters of |member|
    judgespec  (x,t) `elem` g                            ^^  -- notation for judgments of |member|
\end{code}
\caption{Examples of declaring relations in Ruler.}
\label{fig.ruler.defrelations}
\end{figure}

\begin{defi}[Hole]
In Ruler, the parameters of a relation are called \emph{holes}. The parameters are
explicitly named and are explicitly typed.
\end{defi}

Figure~\ref{fig.ruler.defrelations} shows how to declare a 3-place relation |expr|.
The line with \emph{judgespec} specifies a custom notation (meta-grammar) for the |expr| relation in
terms of a meta grammar production. Each hole must be uniquely
present as meta nonterminal in this meta production.

\begin{figure}[tb]
\begin{code}
  ruleset theRules relation expr                      -- a set of rules for |expr|

    rule e.var view D                                 -- rule for the var-case
        judge L  : member  ^^^  (x,t) `elem` g        -- premise with the name |L|
        ~~~
        judge R  : expr    ^^^  g :- x : t            -- conclusion with the name |R|

    rule e.app view D                                 -- rule for the app-case
        judge F  : expr    ^^^  g :- f : t1 -> t2     -- premise with the name |F|
        judge A  : expr    ^^^  g :- a : t1           -- premise with the name |A|
        ~~~
        judge R : expr                                -- alternative syntax for judgment
          | g  = g                                    -- binds expression |g| to hole |g|
          | e  = f a                                  -- binds |f a| to hole |e|
          | t  = t2                                   -- binds |t2| to hole |t|
\end{code}
\caption{Example of Ruler rules.}
\label{fig.ruler.examplerules}
\end{figure}

\begin{defi}[Ruleset]
A \emph{ruleset} is a group of rules in combination with a collection of meta-information,
such as a name for the group.
\end{defi}

A relation is either a foreign relation, or it is specified by the rules
of some ruleset.
Figure~\ref{fig.ruler.examplerules} shows the ruleset |theRules|.
A rule is given an explicit name, and zero or more judgments as premises above the line,
and one premise under the line. A judgment has an explicit name and corresponds to a
relation. The arguments are either bound to the corresponding holes via the
custom syntax in a nameless way, or via a generic syntax where each binding is explicitly given.

The language of Ruler expressions consists of meta variables (such as |f| and |a|), and
external constants and operators (such as the function arrow). The interpretation of such
terms depends on the target language. Denotations can be given for individual symbols and constants,
as well as for (saturated) applications:
\begin{code}
rewrite   ag  ^^  ((t1 :: Ty) -> (t2 :: Ty))  = (TyArr t1 t2) :: Ty    -- denotation of application
external  TyArr                                                        -- identity denotation

rewrite  tex  ^^  ((t1 :: Ty) -> (t2 :: Ty))  = (t1 -> t2) :: Ty       -- fully saturated appl only
format   tex  ^^  -> ^^                       = "\rightarrow"          -- denotation of a symbol
\end{code}
Rewrite rules are applied in a bottom-up fashion.
The LHS of a rewrite rule specifies a typed pattern to match against a Ruler expression.
The RHS must again be a Ruler expression, which then is assumed to have the given explicit type.
These types allow the notation to be overloaded. A rewrite rule applies if both the syntax and the types
of the LHS matches the actual Ruler expression.

An explicit denotation must be given for foreign relations.
The distinction between conventional relations and foreign relations is that
the latter is not defined by rules. An explicit denotation to the host language
needs to be given for a foreign relation:
\begin{code}
relation member view D                          -- |member| is not specified by rules
  judgeuse tex  ^^  (x,t) `elem` g              -- denotation for \LaTeX{}
  judgeuse ag   ^^  (Just t)  = lookup x g      -- denotation for AGs with Haskell

format tex ^^  `elem` ^^^ = "\in"               -- denotation of a symbol
external Just lookup                            -- identity denotation
\end{code}

The above declarative specification can be typeset to \LaTeX{}. To describe an algorithmic
version, additional information needs to be added to the relations and rules. We can describe these
additions separately by defining a view |A| that extends from |D|:
%format viewhierarchy = "\mathbf{viewhierarchy}"
\begin{code}
viewhierarchy D < A        -- partial order on views
\end{code}
In particular, we need to define how the relations are mapped onto nonterminals of an
AG, and turn the relations into deterministic functions such that the parameters can
be mapped to attributes.

Ruler code does not stand on its own. The generated code for type inferencing is
supposed to be used in conjunction with other AG code.
An association is specified between relations (schemes) in Ruler and nonterminals in the AG.
For example, the relation |expr|
is associated with the nonterminal |Expr|. Its type rules are associated with productions of |Expr|. This correspondence
is made explicit by annotating the nonterminal declaration with the name of the relation, and
the productions with the name of the corresponding rule\footnote{
  Ruler allows productions to be defined by combining rules.
  Within the square brackets may not only be a name of a rule, but also an expression
  that denotes a composition of rules. An example is the left-biased union of two rules.
}:
\begin{code}
grammar Expr [expr] view A       -- relation |expr| mapped to nonterminal |Expr|
    prod Var [e.var]             -- rule |e.var| mapped to production |Var|
      term nm :: String
    prod App [e.app]             -- rule |e.app| mapped to production |App|
      nonterm f : Expr
      nonterm a : Expr
\end{code}
Moreover, the holes are mapped attributes. Judgments represent
the semantics of a production. Judgments of a foreign relation are translated to AG rules.
The other judgements correspond to children of the production and are mapped to rules that
define the inherited attributes of the children, or pattern match against the synthesized
attributes.

In view |A|, we refine the declaration of |expr| to include additional information.
In this example, we only provide additional information about the attributes. In general
also additional attributes and syntax can be added.
We map the holes of the |expr| relation to inherited or synthesized attributes. The
hole that corresponds to the AST gets the special \emph{node} designation:
\begin{code}
relation expr view A          -- algorithmic view on |expr|
  holes  node e  :: Expr      -- AST node
         inh  g  :: Gam       -- input param is inherited attribute
         syn  t  :: Ty        -- output param is synthesized attribute
\end{code}
A Ruler expression is at a defining position if it is bound to an inherited hole
of a premise, or bound to a synthesized hole of a conclusion judgment. Otherwise, it is
at a using position. The generated algorithm constructs a value at defining positions,
and matches against values at usage positions.

\begin{figure}[tb]
\begin{code}
ruleset theRules relation expr                          -- extensions to rules for |expr|

  rule e.var view A                                     -- rule for the var-case
      ~~~
      judge R  : expr    ^^^  g :- (node nm = x) : t    -- association between term |nm| and |x|

  rule e.app view A                                     -- rule for the app-case
      ~~~
      judge R : expr                                    -- convenient for extensions
        | e  = (node f = f) (node a = a)                -- assoc child |f| to |f| and child |a| to |a|
\end{code}
\caption{Rules demonstrating node-holes.}
\label{fig.ruler.nodehole}
\end{figure}

A node-hole takes the AST as value. In judgments of rules, node holes are treated
differently with respect to normal holes in order to define the correspondence between
judgments of the type rule and children of the production.
The node hole of a premise judgment may only be an identifier and corresponds uniquely to
child of the associated production. In the expression bound to the node hole of the
conclusion, these identifiers must occur and we annotate them with the name of the
child of the production.
Figure~\ref{fig.ruler.nodehole} shows an example.
The syntax for explicitly named bindings of holes is convenient for extensions, as only the
bindings that are redefined need to be mentioned.

When a Ruler description is well-formed, an AG can be generated from the above description.
This AG contains attributes and semantics for type inference, and can  be combined with
other attributes and Haskell infrastructure into a compiler.

\subsection{Damas-Hindley-Milner Inference}

Similar to Section~\ref{sect.intro.dhm} and Section~\ref{intro.sect.ag.dhm} we show
in this section a DHM inference algorithm in Ruler.
In the DHM view, we add lambda abstractions without an explicit type annotation and a
let expression.
However, we only show
the code for the app-rule, since we already explained Ruler's concepts in the previous
section.

\begin{figure}[tb]
\begin{code}
viewhierarchy D < A < H                            -- DHM view |H|

relation expr view H                               -- DHM view on |expr|
  holes  chn  s      :: Subst                      -- substitution as threaded attribute

relation tyFresh view H                            -- wrapper for |fresh|
  holes  chn  s      :: Subst                      -- threaded subst
         syn  t      :: Ty                         -- fresh type
  judgespec     ^^  t `fresh`                      -- syntax for the judgement
  judgeuse ag   ^^  (t, asyn.s)  = fresh ainh.s    -- denotation for AGs with Haskell

relation tyUnify view H                            -- wrapper for |unify|
  holes  chn  s      :: Subst                      -- threaded subst
         inh  t1     :: Ty1                        -- left type
         inh  t2     :: Ty2                        -- right type
  judgespec    ^^  t1 == t2                        -- syntax for the judgment
  judgeuse ag  ^^  (retain errs, asyn.s) = unify g t1 t2 ainh.s
\end{code}
\caption{DHM relations in Ruler.}
\label{fig.dhm.ruler.relations}
\end{figure}

In Figure~\ref{fig.dhm.ruler.relations}
we add a DHM view on top of the algorithmic view, and define external relations to
obtain fresh types and to unify two types.
The retain-keyword declares that the left component of the output tuple of
|unify| is mapped to a local AG attribute |loc.errs|, which can then be
collected by conventional AG rules. This mechanism allows values
to be exposed as attributes to the encapsulating AG.

\begin{figure}[tb]
\begin{code}
ruleset theRules relation expr
  rule e.app view H
    judge T : tyFresh  ^^  r `fresh`
      | ainh.s  = s1   ^^  | asyn.s  = s2
    judge F : expr
      | t       = t1   ^^  | ainh.s  = s2  ^^  | asyn.s  = s3
    judge A : expr
      | t       = t2   ^^  | ainh.s  = s3  ^^  | asyn.s  = s4
    judge U : tyUnify  ^^  t1 (t2 -> r)
                       ^^  | ainh.s  = s4  ^^  | asyn.s  = s5
    ~~~
    judge R : expr     ^^  | ainh.s  = s1  ^^  | asyn.s  = s5
\end{code}
\caption{Ruler rules for |e.app|}
\label{fig.ruler.dhm.rules}
\end{figure}

Figure~\ref{fig.ruler.dhm.rules} shows the rules for |e.app| in view |H|.
The hole-bindings for judgments essentially specify the threading of
the substitution.
The notation for hole-binding can be used to supply bindings for holes
that are not present in the judgement's special syntax.
To connect two nodes, we introduce an intermediate meta variables
|s1|, |s2|, etc. This threading has to be done manually as Ruler does not
have a concept of copy rules.

\subsection{Discussion}
\label{intro.sect.ruler.discussion}

  Ruler provides notation and composition mechanisms for the
  description of type rules.
  Rules may inherit from other rules,
  and rules inherited from the same rule from preceding views.
  These mechanisms enhance modularity and reuse.
  Effectively, a ruler fragment is a partial specification.
  The meaning of a ruler specification
  is only defined for a complete composition in combination with the associated
  attribute grammar.
  It would aid formal reasoning if a
  meaning can be attached to individual fragments.

  The Ruler compiler generates only an inference algorithm for algorithmic specifications.
  As a consequence, the code generation is limited to syntax-directed type rules.
  Syntax-directedness
  does not hold for many declarative type systems that have relations with overlapping rules,
  or rules that dispatch on more than one argument of the conclusion judgment.
  Also, the premisses must be functional. The transformation of a relation to a function by
  itself is non-trivial, and common techniques such as fixpoint iteration or search
  strategies are not directly supported by Ruler.
  We address these complications in this thesis (Section~\ref{intro.sect.overview}).

  Moreover, Ruler's features are actually not specific
  to the domain of type systems.
  Ruler provides a rudimentary composition mechanism, syntactic
  sugar, and type setting support for a formalism that is not unlike AGs.
  These features would equally well benefit AGs in general\footnote{
    An example of a feature that benefits AGs is Ruler's automatic
    unique numbering mechanism.
    We generalized and implemented a similar mechanism for AGs.
  }.

\section{Thesis Overview}
\label{intro.sect.overview}

Our ultimate goal is to semi-automatically
generate a type inference algorithm
from a declarative type system specification.
In particular, we focus on type systems described as a
collection of type rules, and an implementation based on
attribute grammars.

In this thesis, we present RulerCore, a language that extends
attribute grammars. It facilitates the composable description
of inference algorithms that are typically used to implement
declarative aspects of type rules.
RulerCore can thus be used to give an executable semantics
to a set of type rules.

In this section, we give an overview of RulerCore's extensions
on attribute grammars. In Section~\rref{intro.sect.globalpicture}
we position this thesis with respect to the larger goal.


\subsection{Inference Algorithms as an Attribute Grammar}

We give an abstract description of how we structure inference
algorithms as an AG. Chapter~\rref{chapt.iterative} shows a
concrete example.

We give AGs over typing derivations, instead of AGs over the abstract syntax of a language.
In the following chapters of this thesis, we refer with \emph{abstract syntax tree} either to
typing derivations or to the result of parsing\footnote{
  The typing derivation is typically an extension of the AST,
  thus the difference is usually irrelevant and can be determined
  from the context. Similarly, we refer to AST and the AST decorated with attributes
  interchangeably.
}. Nonterminals thus correspond loosely to typing relations, and productions to type rules
(Section~\ref{sect.intro.agtypespec}).
The grammar for typing derivations may have more structure than is present in the type rules.
For example, we may add nodes that each represents a choice between alternatives, so that we
effectively describe a forest of typing derivations.

We map each declarative aspect of a type rule to a higher-order child
(Section~\rref{intro.sect.higherorder}) as shown by Section~\rref{intro.sect.copyrules}.
Extra attributes, such as a substitution, provide contextual information for these
children.
For example, an equality premise between two types in the type rules corresponds to a
unification-child in the AG. The structure of the unification child servers as proof that the two
types can be made equal, and the resulting substitution attribute reflects the effect of unifying
the two types.
We treat meta variables in type rules as conventional attribute values (Section~\ref{intro.sect.ag.dhm}).

Type inference amounts to determining the structure of these children.
To choose a particular tree for a child may require an exploration of candidate
trees. Instead of constructing a single derivation tree, we actually construct and choose from
a forest of derivation trees.
Through value dependencies between attributes, we effectively define in which order the structure
of these children is determined, and in which order the effects of determining this structure is
visible in the substitution attribute.

This approach allows us to encode Algorithm W (Section~\ref{sect.intro.dhm}). For more
complex inference algorithms, such as constrained-based algorithms and algorithms that require
fixpoint iteration, the shape of the derivation tree and the values of attributes are
mutually dependent.
Inference algorithms therefore analyze, explore, and extend (intermediate) partial derivation trees.
This process does not have a straightforward mapping to an AG, since an AG specifies when
the resulting derivation tree is correct, but not how the intermediate trees are obtained.
We introduce extensions to AGs to make such intermediate trees visible in the AG description.

\subsection{Attribute Grammar Extensions}
Chapter~\rref{chapt.rulercore} gives a detailed outline of each extension.
The following chapters work out each extension individually.

\paragraph{Visits.}
RulerCore is a language for the description of higher-order, ordered attribute grammars.
We extend this basis with explicit specifications of visits. In comparison, visits are
implicit in ordered attribute grammars.
Chapter~\rref{chapt.side-effect} introduces the language and the notation.

For ordered attribute grammars, there exists an evaluation algorithm that starts with an
initially undecorated tree and ends in a correctly decorated tree.
The \emph{state} of a tree is the collection of decorations present in the tree.
During the evaluation, the state of the tree thus changes.
A \emph{configuration} is a set of attribute names. A configuration describes the state of
a tree when the set of decorations of the root contain exactly the attributes as mentioned
in the configuration.
In an ordered attribute grammar a linear order exists between configurations.
A \emph{visit}, a unit of evaluation for a node, transitions the state of a tree to a
state described by the next configuration.
We treat a node with a state described by a certain configuration as a
\emph{first class value}, which we can store in attributes, obtain from attributes, inspect, and
programmatically apply state transitions on.
This approach offers us sufficient control over the AG evaluation to combine AGs with
\emph{monadic operations}, such as the unification monad as shown in this chapter.

We provide notation to declare a totally ordered sequence of visits per nonterminal.
Each attribute declaration must be associated with one visit declaration, which
has consequences for the scheduling of rules.
During attribute evaluation, attributes that are associated
with an earlier visits are defined before attributes of a later visit are computed/defined. In this
context \emph{defined} means that a reference to the attribute value is available.
Similarly, rules of productions of a nonterminal may be associated with a visit of that
nonterminal, which restricts the scheduling of such rules to either that visit or to a later visit.
Furthermore, we may restrict a rule to a particular visit, which ensures that the rule
is scheduled before any rules of subsequent visits.

With this approach, rules may make assumptions about the configuration of tree tree prior to
the rule's evaluation. Moreover, the notation allows us to define customizations of evaluation
strategies for rules of a particular visit or for visits to particular subtrees.

A disadvantage of our approach with respect to attribute grammars
is that we need to specify a visit for each attribute. This
requires additional effort and makes attribute declarations less composable. We typically
declare an attribute for a set of nonterminals instead of a single nonterminal. In our approach,
this is only possible if all the nonterminals in the set have a common visit to which the
attribute can be scheduled. We show in Chapter~\rref{chapt.warren} how to solve this issue.


\paragraph{Fixpoint iteration, clauses and constraints.}
In Chapter~\rref{chapt.iterative} we exploit the notion of visits. We show how we to conditionally
repeat the evaluation of a visit to a child, which allows the encoding of \emph{fixpoint iteration}.
Using visit-local attributes, a state can be kept between iterations. Moreover, we allow one or more
clauses to be defined for a visit. Each clause provides an alternative set of rules for the visit.
With special match-rules we specify constraints on clauses.
With clauses in combination with higher-order children we can define the structure of the
derivation tree in terms of attributes, and thus deal with type-directed inference algorithms.

We treat intermediate derivation trees as first class values. With the specification of visits, we can reason
about the configuration a tree is in. A tree that is in a certain configuration can
be detached, transfered via attributes to another location, and attached there. With this mechanism
we can represent \emph{constraints} or \emph{deferred judgments} as trees with access to their context
via attributes.

\paragraph{Exploration of alternatives.}
The above extensions describe algorithms that conservatively approximate the derivation tree.
For some type systems it is necessary to explore a forest of candidate derivation trees.
Such a forest can be represented with a decision tree, which contains choice nodes that
branch to various \emph{alternatives}.

In Chapter~\rref{chapt.breadthfirst} we show how to describe explorations of such alternatives
with AGs. We present a technique that allows a spectrum of depth-first and breadth-first search
strategies to be described.
In a statically scheduled AG, we can evaluate the AG in a step-by-step fashion. By intertwining
the evaluation of alternatives, we obtain a breadth-first search.
After each step, some intermediate values may be available, which can be used to direct the
search process.

\paragraph{Phases and Commuting Rules.}
In Chapter~\rref{chapt.warren} we generalize visits to \emph{phases}. A phase
may consist of one or more implicitly defined visits, which are determined by
the static scheduling of the AG. As a consequence, an attribute does not need
to be explicitly assigned to a visit, and its scheduling may optionally be
constrained by a phase.
Using this approach, our extensions extend AGs conservatively.

For some chained attributes, the order induced by value dependencies of their
rules may be too strict when these rules encode commuting operations.
We present \emph{commuting rules}, which are rules that are connected via a chained attribute,
but which do not depend on previous rules in the chain,
Such rules thus give us more freedom in the scheduling of these rules.
Typical examples are the threading of a unique number supply, and the threading of
substitutions.
To preserve referential transparency, the commuting rules
must satisfy a liberal commutativity law.
With such rules we can functionally encode the behavior of a
rule with side effect that is scheduled to different implicit visits.

\paragraph{Dependent AGs.}
In Chapter~\rref{chapt.dependent-ags} we apply dependent types to AGs.
In a dependently typed AG, the type of an attribute may refer to values
of attributes. The type of an attribute is an invariant,
the value of an attribute a proof for that invariant.
Thus, with dependent AGs we can proof properties of our compiler.
Additionally, this chapter serves as a showcase for visits and clauses.


\subsection{Contextual Chapters}

In the extended edition of this thesis\footnote{
  Extended edition: \url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/thesis-extended.pdf}
}, we place the above extensions
in a wider context.

\paragraph{Graph Traversals.}
Many computations in a compiler take control-flow or data-flow graphs into account. We show
that the mechanism to attach and detach children
can be used to interface AGs with graph traversals~\citep{ariem-ext-graph}.

\paragraph{GADTs.}
In the extended edition~\citep{ariem-ext-gadt}, we show an example of a type system
for Generalized Algebraic Data Types (GADTs).
An inference algorithm for this system requires an exploration of alternatives (Chapter~\tref{chapt.breadthfirst}).
We formulate our specification so that it is orthogonal to specifications
of ADTs:
\begin{itemize}
\item We present our specification as System F augmented with first-class equality proofs.
\item We exploit the Church encoding of data types to describe GADT matches in terms of conventional
lambda abstractions.
\end{itemize}
Such orthogonal designs are important in order to compose type systems, and ultimately thus
also to compose type inference algorithms.


\section{The Context of this Thesis}
\label{intro.sect.globalpicture}

Types play an increasingly more important role in the design of programming languages.
Type systems specify a relation between programs and types, which facilitates
(formal) reasoning with typed programs.
Moreover, type systems form a partial specification for type checking and type
inference algorithms.
As we discussed in Section~\ref{intro.sect.motivation}, our ultimate goal
is to semi-automatically derive type inference algorithms from declarative
type system specifications. We mentioned in Section~\ref{intro.sect.typesystems} that
a set of type rules alone is not a complete description, hence we develop
Ruler (Section~\ref{intro.sect.ruler}), which is a domain-specific
programming language in which we write an
inference algorithm as an extension of the declarative type rules.

\subsection{Challenges}

There are several challenges that need to be overcome to reach this goal.
We identify two main challenges. This thesis is situated in the second challenge.

The first challenge is related to
type system compositions. Language features and their declarative type systems are
typically defined as extensions of a bare lambda calculus. Some language features are
mutually conflicting (e.g. invalidate type soundness). However,
many language features compose in standard ways. For example,
features described for a lambda calculus that support fix-expressions can be
translated to a description for a language with recursive let bindings.

To meet this challenge, we wish to describe language features in isolation,
and describe a composition of these features for the actual source language.
As illustration, the type rule formalism lacks the expressiveness that higher-order
functions offer to functional programs, such as the ability to abstract over common
patterns, and to instantiate these abstractions with minimal syntax.
For small type system descriptions that appear in type system theory, such expressiveness
is not needed. However, type system descriptions of actual languages are large and
hard to maintain.

The second challenge is related to declarative type rules. Declarative type rules
abstract from evaluation strategies. However, a general inference algorithm does not exist,
and a naive algorithm such as mentioned in Section~\ref{intro.sect.inference} is
either incomplete or inefficient.

Ultimately, inference boils down to resolving \emph{declarative aspects}:
to determine the structure of the derivation tree, and computing with values that may not be
fully determined yet.
In practice, inference algorithms are intricate compositions of common
algorithms that treat such declarative aspects in a predictable and deterministic way.
These algorithms are hard to combine. If one declarative aspect
requires a constraint-based algorithm for its resolution, and another requires some form of search,
then the order in which the aspects are resolved is likely to be relevant.
Also, it is hard to describe the flow of information between different
solving techniques.
To allow these techniques to mutually cooperate, we need a language for the
description of a composition of such techniques. Hence, this thesis.

\subsection{Additional Challenges}

Aside from the general motivation of our research, another source of motivation is that
we intend the results of our research to benefit the implementation of our
Haskell compiler UHC~\citep{uhc}. Therefore, we impose additional demands on solutions to the above
challenges.

Firstly, since UHC's implementation is based on Haskell itself, we require
that solutions integrate seamlessly with Haskell. This restriction effectively
rules out the direct use of (functional) logic languages, due to differences
in the evaluation model and the representation of data structures\footnote{
  \citet{Hanus10transformingfunctional} show that an embedding of functional
  logic programs is possible in Haskell, but affects all data representations
  and forces all computations to monadic style.
  However, we use techniques techniques and ideas from logic programming
  that integrate seamlessly, such as backtracking in a monad~\citep{Hinze:2000:DBM:351240.351258,DBLP:conf/icfp/KiselyovSFS05}.
}. In addition, we desire that our research can also be exploited in compiler
suites that are implemented with languages without lazy evaluation or strict
typing disciplines.

Secondly, along similar lines, we refrain from the use of dependently-typed languages,
since an extraction to Haskell is a one-way process that also affects data-type representations.
Our goal is to be able to generate an implementation. Advances in dependently-typed languages
seem promising, but a formally certified implementation of a compiler such as UHC is
currently infeasible.

Finally, the resulting implementation should be reasonably efficient in order
to process ASTs of large programs.
In our experiences with UHC, we noticed that memory usage is an issue when using
demand-driven evaluation of AGs.
We experimentally verified that the time spend on traversing abstract
syntax trees in UHC is negligible in comparison to the computations that
are performed on each node of the AST.
Thus, while traversal overhead is rarely a problem, memory usage is an item of
concern, which we address by using statically ordered evaluation of AGs.


\subsection{Solutions}
\label{sect.intro.solutions}

A partial solution to the first challenge is given by \citet{dijkstra05phd},
as demonstrated by the UHC project, and the initial development of
Ruler (Section~\ref{intro.sect.ruler}) in particular.

Ruler's composition mechanisms and syntax extensions that are provided by Ruler
would be beneficial to AGs.
For example, many dense translation schemes in this thesis are manually derived
from actual AG descriptions. These AG descriptions focus on attributes in
isolation and are easier to understand, but too verbose for inclusion in this
thesis.
Solutions for AGs would also work for type rule descriptions, and vice
versa.
%% For example, many translation schemes in this thesis are written with
Indeed, composition facilities for AGs receive ongoing
attention~\citep{DBLP:conf/icfp/VieraSS09,DBLP:conf/gpce/Saraiva02}.

%% Claim that this first challenge is not something we want to work on.
There is still a long way to go with respect to the first challenge. Since declarative
rules abstract from an evaluation algorithm, the data structures and administration
that are involved in the algorithm are chosen for convenience and notational
conciseness.
In an actual implementation, we may be able to represent certain administration
in a more efficient way using specialized data structures.
However, we do not address these issues in this thesis, and only mention some in
passing in \citet{ariem-ext-gadt}.
We assume that the declarative rules are specially crafted to make them
more suitable for an actual implementation.

%% I want to express here the reasons why we choose for AGs, and not
%% for other techniques such as constraints, monadic traversals, etc.
There is thus some open work for the first challenge.
However we focus on the second challenge. This challenge is more
pressing, because we need basic building blocks before we can compose them
in clever ways.

We propose to tackle the second challenge with attribute grammars.
Inference algorithms that are specified by declarative type rules are sensitive to
context---non-inductive properties of the AST---and attribute grammars excel in
providing such contextual information with attributes, as is proven by UHC's
implementation.
Also, from a practical perspective, since UHC's implementation is based on AGs, an inference
algorithm based on AGs interfaces conveniently with attributes of other components
in the compiler, as shown by previous work on Ruler.

However, attribute grammars in current form at not well suited
for the description for inference algorithms of complex type systems.
Inference algorithms therefore make explicit assumptions about the intermediate states of
the derivation trees during its construction.
In a conventional AG, we cannot do so, because AG descriptions are defined in terms of the final
state of the derivation tree.
In order to make assumptions about the intermediate state, we extend AG evaluation and hence
arrive at Section~\rref{intro.sect.overview}.

Our approach applies to the description of algorithms that are recursive functions
over tree-like data structures. In particular, our approach applies to catamorphisms, which
is not surprising because attribute grammars can be considered a domain specific language
for the description of catamorphisms.
On the other hand, for example, algorithms based on graph rewriting are not straightforwardly expressed
in our approach. An inherent difference is that we traverse a structure whereas rewrite rules
as used by graph rewriting access the structure in irregular ways.
Also, algorithms that involve matrix operations to efficiently solve linear constraints
cannot be described straightforwardly.
However, \citet{ariem-ext-graph} shows how to mix attribute evaluation with
external solvers.

\section{Related Work}
\label{sect.intro.relatedwork}

%include ../related-work/related-work.lhs

%if False
\section{Common Notation}
We often refer to keywords which are typeset in bold in code blocks.
To prevent typesetting them in bold in the running text, we
use the notational convention that some keyword |visit| is called
the visit-keyword. We trade type setting for additional hyphens.
%endif

\section{Conclusion}
\label{sect.intro.conclusion}

In the following chapters, we present several extensions to attribute grammars that facilitate
the description of complex type inference algorithms.
The central concept in these chapters is that we exploit the explicit notion of
visits to control and manipulate chunks of AG evaluation.
It allows us to transform the tree during attribute
evaluation---precisely what we need to express type inference algorithms.
Many classic AG approaches use a notion of visits in their intermediate languages
(Section~\ref{sect.intro.ag.eval}). In this thesis, we instead
propose to use visits as \emph{programming model}.
With this programming model,
we express resolution strategies for
declarative aspects of type rules.

Our extensions offer flexible ways to \emph{tune} conventional attribute grammar evaluation, and
are conservative extensions of (ordered) attribute grammars. We offer a delicate balance between
on the one hand the implicit evaluation strategy of attribute grammars, and on the other hand
the need to make this explicit for more complex evaluation strategies.

Underlying our extensions are well-defined concepts from
higher-order AGs (first-class children), and ordered AGs (visits).
Underlying these concepts are well-defined
concepts from functional programming languages (first-class functions,
coroutines, and referential transparency). These concepts form a solid
theoretical basis to build upon.

The extensions that we present are not limited to type inference. In fact, type inference is a use
case that sets challenges whose solutions improve the abstraction facilities that are available
to structure compilers.

%% Maybe mention that this thesis sheds some light on:
%% AGs and monads
%% AGs and imperative languages with side effects
%% AGs and dependently typed programming

\paragraph{Thesis organization.}
Chapter~\tref{chapt.rulercore} gives a detailed summary of the thesis.
Subsequent chapters focus on individual extensions, and reintroduce relevant
terminology. For background information
related to type systems and attribute grammars, Section~\ref{intro.sect.typesystems}
and Section~\ref{intro.sect.ags} can be used as reference.

\paragraph{Publications.}
The chapters of thesis are based on the following publications:
\begin{itemize}
\item We presented an earlier version of Chapter~\rref{chapt.side-effect} at the Workshop on
Generative Technologies (WGT '10) at ETAPS in 2010~\citep{middelkoop09wgt10}. An extended
version appeared in the journal of Higher-Order Symbolic Computation~\citep{Middelkoop:2011:VAG:2007516.2008132}.
\item Some of the work of Chapter~\rref{chapt.warren} is to appear on PADL'12.
\item Chapter~\rref{chapt.iterative} is an extended version of the paper that we presented at
the conference on Generative Programming and
Component Engineering (GPCE '10) in 2010~\citep{Middelkoop10gpce}.
\item An earlier version of the chapter about GADTs (in the extended edition of this thesis)
appeared in the post proceedings of Trends in Functional Programming (TFP '08)~\citep{mypaper},
and a later version appeared in the journal of Higher-Order Symbolic Computation~\citep{springerlink:10.1007/s10990-011-9065-0}.
\item Chapter~\rref{chapt.dependent-ags} is to appear in the post proceedings of the symposium on
Implementation and Application of Functional Languages.
\item We presented Chapter~\rref{chapt.breadthfirst} at the workshop on Language
Descriptions Tools and Applications (LDTA '11) in 2011.
\end{itemize}
As a formal detail, the Association for Computing Machinery (ACM) has copyright on the paper version
of Chapter~\rref{chapt.iterative} and Chapter~\rref{chapt.breadthfirst}.
%% Springer Verlag has copyright on the paper version of Chapter~\rref{chapt.case-study}.
Elsevier has copyright on the paper version of Chapter~\rref{chapt.side-effect}.