\documentclass[preprint,authoryear]{sigplanconf}

%include lhs2TeX.fmt
%include polycode.fmt

\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{stmaryrd}
\usepackage{float}
\usepackage{xcolor}
\usepackage{txfonts}
\usepackage{natbib}
\usepackage[pdftex]{graphicx}
\usepackage{tikz}
\usetikzlibrary{trees,arrows,fit,decorations}

\floatstyle{boxed}
\restylefloat{figure}

%format p1
%format p2
%format a1
%format a2
%format a3
%format a4
%format a5
%format a6

\begin{document}
  
\conferenceinfo{ICFP '10}{Sept 27, Baltimore.} 
\copyrightyear{2010} 
\copyrightdata{[to be supplied]} 

\titlebanner{banner above paper title}        % These are ignored unless
\preprintfooter{short description of paper}   % 'preprint' option specified.

\title{Functional Visitors with Callback for Reusable Traversals}

\authorinfo{Arie Middelkoop \and Atze Dijkstra \and S. Doaitse Swierstra}
           {Universiteit Utrecht}
           {\{ariem,atze,doaitse\}@@cs.uu.nl}

\maketitle

\begin{abstract}
A compiler consists of a diversity of algorithms that \emph{traverse} an
Abstract Syntax Tree (AST) and \emph{decorate} the nodes with computed
properties.
These algorithms are usually monolithically defined in terms of the the AST,
which impairs modularity and reuse.
Our goal is to factor up these algorithms into simpler, more abstract,
and reusable components.

We write an otherwise monolithic algorithm on AST M, as a three-step
algorithm, that (1) generates a more abstract AST V, (2) applies a
reusable traversal that decorates V, and finally (3) translate the
results of V back to M. During V's construction in step (1), we
add extra nodes that extend the algorithm of step (2), or are
place holders for data exchange between M and V to facilitate
step (3).

We present a functional domain specific language based on visitors
in which we intuitively express such algorithms. The absence of
side effect allows us to enforce a well-defined evaluation
order.
\end{abstract}

\category{CR-number}{subcategory}{third-level}

\terms
term1, term2

\keywords
keyword1, keyword2


\section{Introduction}

Compilation can be modeled as a pipeline of tree transformations~\cite{DBLP:conf/haskell/DijkstraFS09}.
Each transformation in the pipeline is preceded by an analysis that traverses the AST and decorates it
with analysis results.
Many of these algorithms resemble each other.
For example, name analysis of data types declarations resembles name analysis of expressions, and
type checking for expressions resembles kind checking of types.
This commonality is exploited in compiler implementations. Common infrastructure, such as libraries for
environments and substitutions, are factored out as separate and reusable components.

However, factoring out the actual traversals for reuse is a difficult task:
the ASTs corresponding to data type declarations and expressions are different, as well as the
ASTs of expressions and types. Generic traversals are not powerful enough to bridge these differences,
as they are not only present in the structure of the AST, but also in behavior for common structure.
For example, we typically allow more polymorphism for type analysis compared to kind analysis.
The lack of modularity of traversals is an engineering deficiency that limits isolation of concerns,
impairs reuse, and affects maintenance due to code duplication.

Our experiences with the Utrecht Haskell Compiler (UHC) show these problems.
The UHC is implemented incrementally as a stack of language extensions.
Its implementation starts with a compiler for a simple lambda calculus, and
ends with Haskell plus some experimental extensions.
Typically, each language extension gives rise to additional syntax. The
behavior for this syntax typically has aspects in common with behavior for
syntax already supported by the implementation. For example, a lot of syntax
deals with name binding and scoping, the split of environments over
subexpressions, or the combination of types of subexpressions.
We wish to abstract from these patterns, as these now lead to a duplication of code.

%% The implementation of UHC can be used for rapid prototyping and experimentation.
%% We can take any substack of language features that suits our needs best as basis for
%% further programming.
%% However, to be able to reuse the code, the AST we are interested in
%% must be a structural extension of the UHC AST.

%% We ignore some other forms of code duplication:
%%  namely the need to override existing code.
%% Too some extend the problem cannot be avoided. Crosscutting language features.
%% However, we would like this cross-cutting to be described individually.

\begin{figure}[htp]
\begin{center}
\begin{tikzpicture}
  [ nd/.style={circle, minimum size=10mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize}
  , attr/.style={rectangle, minimum size=2mm, node distance=3mm, very thick, draw=black!50, top color=white, bottom color=black!20,font=\footnotesize}
  , annot/.style={font=\footnotesize}
  , inh/.style={attr}
  , syn/.style={attr}
  , leftfirst/.style={xshift=-4mm}
  , rightfirst/.style={xshift=4mm}
  , arr/.style={->,dashed}
  , inst/.style={draw=black!50, dotted, thick}
  , commline/.style={draw=red!50!black!50, very thick}
  ]
  \node[nd](sc){scope}
    child {
      [sibling distance=20mm]
      node[nd](spl){split}
        child {
          node[nd](use){use}
        }
        child {
          node[nd](def){def}
        }
    };
  
  \node[nd, left of=sc, xshift=-27mm](let){let}
    child {
      node[nd](lam){lam}
        child {
          node[nd](pat){pat}
        }
        child {
          node[nd](var){var}
        }
    }
    child {
      node{$\ldots$}
    };
  
  \node[syn,rightfirst](letsyn)[right of=let]{};
  \node[syn,rightfirst](lamsyn)[right of=lam]{};
  \node[syn,rightfirst](varsyn)[right of=var]{};
  \node[syn,rightfirst](patsyn)[right of=pat]{};
  
  \node[syn,rightfirst](scsyn)[right of=sc]{};
  \node[inh,leftfirst](scinh)[left of=sc]{};
  \node[syn,rightfirst](splsyn)[right of=spl]{};
  \node[inh,leftfirst](splinh)[left of=spl]{};
  \node[syn,rightfirst](defsyn)[right of=def]{};
  \node[inh,leftfirst](definh)[left of=def]{};
  \node[syn,rightfirst](usesyn)[right of=use]{};
  \node[inh,leftfirst](useinh)[left of=use]{};
  
  \draw[arr] (letsyn) to (lamsyn);
  \draw[arr] (lamsyn) to (varsyn);
  \draw[arr] (lamsyn) to (patsyn);
  
  \draw[arr] (splinh) to (scinh);
  \draw[arr] (useinh) to (splinh);
  \draw[arr] (definh) to (splinh);
  \draw[arr] (scsyn)  to (splsyn);
  \draw[arr] (splsyn) to (defsyn);
  \draw[arr] (splsyn) to (usesyn);
  \draw[arr] (scinh)  to [out=90,in=90] (scsyn);
  
  \node[inst,fit=(spl) (sc) (use) (def) (useinh) (defsyn)](instbox){};
  \draw[inst] (letsyn) to (instbox);
  
  \draw[commline] (sc)  to [out=120,in=60] node[annot,auto]{|<- [Ty]|} (let);
  \draw[commline] (use) to [out=-120,in=-60] node[annot,auto]{|<- Ty,Errs|} node[annot,auto,swap]{|Nm ->|} (var);
  \draw[commline] (def) to [out=-110,in=-70] node[annot,auto]{|<- Errs|} node[annot,auto,swap]{|Nm,Ty ->|} (pat);
\end{tikzpicture}
\end{center}
\caption{Instantiation of the name-analysis component.}
\label{fig.example.instantiation}
\end{figure}

We propose to encode traversal algorithm components as follows.
Each component defines its own suitable AST structure and decorations,
as well as an algorithm for each type of node that compute the
decorations in one or more visits. 
For example, consider the following name analysis component.
Its AST consists of use-, def-, scope-, and split-nodes. Decorations include environments
that transport name information. A def-node introduces a value bound to a given
name, at a use-node we lookup this value. Additionally, we decorate a def-node with
possible duplication-errors, and a use-node with undefined-errors if there is no
name bound to it. The algorithms are invoked twice per node: in the first visit, an environment
is computed based on the computed environments of the children, which holds all bindings defined
in the subtree. At the root, this environment contains all bindings.
At the second visit, we copy this environment down, such that it can be inspected
to determine errors and lookup bound values.
The algorithm formulated formulated as such is abstract (works for any value), and defined on
a lean AST.

To instantiate such a component, consider a type analysis of some (lambda) expression AST (Expr AST).
A part of type analysis is a name analysis of the Expr AST. We define an algorithm that
delegates this work to the name analysis component.
Figure~\ref{fig.example.instantiation} depicts the situation.
We take the following steps:
\begin{enumerate}
\item We define a traversal over the Expr AST to decorate the nodes with a
  Name AST. These decorations, \emph{attributes}, are visualized as the small
  boxes next to the nodes. Dashed arrows between one node and others, indicates
  that the former is a function of the values of the latter.

\item In case of the use-, def-, and scope-nodes, we actually generate a special
  version of these nodes. We call these \emph{callback nodes}.
  A callback node contains a reference back to the expr-node
  (the red line in the picture), and an extension of the algorithm of the name-node.
  The extended algorithm uses the reference back to receive, for example, a name,
  and supplies back the type introduced for this name elsewhere or errors.

\item The computed Name AST is made a child of the let-node and visited according
  to the algorithm for the name analysis. The nodes are decorated with an environment
  (the attribute to the right in the picture) containing all bindings of the subtree.
  It is then used to define the environment for the lookups (the attributes to the left),
  which is subsequently passed down.

\item  Visits to the callback nodes demand or supply data from and to the related node in the
  Expr AST, and thus cause additional visits in the Expr AST. For example, for the
  def-node in Figure~\ref{fig.example.instantiation}, the lam-node must provide a name and
  a type. Later, when the def-node is visited again, it supplies potential errors back to
  the lam-node.
\end{enumerate}

\begin{figure}[htp]
\begin{center}
\begin{tikzpicture}
  [ comp/.style={rectangle, minimum size=13mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize}
  , gen/.style={->>,thick}
  , ext/.style={->,thick}
  , annot/.style={font=\footnotesize}
  , node distance=25mm
  ]

  \node[comp] (tpsig) {type + sig};
  \node[comp, left of=tpsig] (main) {main};
  \node[comp, above of=tpsig] (name) {name};
  \node[comp, right of=tpsig] (kind) {kind};
  \node[comp, below of=tpsig] (type) {type};
  \node[comp, right of=type] (tpkind) {type+kind};
  
  \draw[gen] (main) to node[annot,auto]{gen} (name);
  \draw[gen] (main) to node[annot,auto]{gen} (tpsig);
  \draw[gen] (tpsig) to node[annot,auto]{gen} (kind);
  \draw[gen] (kind) to node[annot,auto]{gen} (name);
  \draw[gen] (kind) to node[annot,auto]{gen} (tpkind);
  \draw[ext] (tpsig) to node[annot,auto]{extends} (type);
  \draw[ext] (tpkind) to node[annot,auto]{extends} (type);
\end{tikzpicture}
\end{center}
\caption{Type inferencer components.}
\label{fig.example.components}
\end{figure}

In Section~\ref{nextsection} we show how to encode the above example in Haskell using
pure functions to compute attributes, and side-effect to access attributes of one node
from another. We consider both the name analysis component mentioned in this section,
and a Damas-Milner type analysis component. Figure~\ref{fig.example.components}
illustrates. The type analysis component uses an
AST similar to the Damas-Milner type rules, most notably a non-recursive let, and explicit
nodes for instantiation and generalization. The type analysis algorithm does not
have to deal with name analysis anymore, which makes it a bit simpler.
Furthermore, we define two extensions of this type inference: kind inference, and
support for type signature (uses kind inference). Finally, we combine all these
components together to form an algorithm for the main AST, which has nodes for
recursive let and type signatures.

This approach is clearly a trade-off. On the one hand, the algorithm is simpler
because we define the algorithms and extensions in isolation. It is therefore also
easier to maintain, explain and reuse.
On the other hand, the approach poses the following challenges:
\begin{enumerate}
\item The generation of another AST is additional work and may make the algorithm more
  verbose. We argue that the ASTs of most components will be simple views on the AST, and
  therefore easy to produce. Aside from that, transforming ASTs is the core business of a compiler
  and therefore well understood.
  
  In a similar line of thought, the extra traversals incur runtime overhead.
  In experience with traversal optimizations of the UHC, we discovered that compared to
  the actual work a compiler is performing (unification,
  constraint solving, etc.), the time spend in traversing the AST is negligible.
  Furthermore, advances in tree-fusion techniques may remove the overhead completely, 
  and traversals on the generated tree are also a good candidate for parallelism.

\item To ensure that our approach is intuitive, it must be clear what data is exchanged
  (i.e. the labels on the red line in the picture),
  and we must ensure that nodes are visited in the proper order, such that, for example,
  the visit that provides the error messages takes place before the visit that inspects them.
  In other words: our programs must be side-effect free.

\item Furthermore, we typically need to \emph{extend} the algorithm of a component. 
  To reuse the type inference algorithm, for example, we may have to add support for
  type signatures, higher-ranked types, etc.
  To obtain true modularity,
  we desire the ability to keep components as a black box, and add additional behavior
  where needed, without having to touch the code of the component itself.
  %% Todo: give a good reason for this
  %% Modularity, compositionality, etc..
\end{enumerate}

To meet the latter two challenges, we introduce a functional language
Functional Callback Visitors (FCV) that resembles
attribute grammars, however with an explicit specification of traversals and
visits (predecessor~\cite{middelkoop09wgt10}).
As with attribute grammars, our specification describes the value of attributes as a function
of the value of other attributes. We require the static guarantee that these computations
are non-cyclic. The absence of both side effect and cyclic definitions make a FCV program predictable.
Programming with attributes has the additional advantage that trivial attribute computations
can omitted and inserted automatically, and that we can determine the relative order of
computations for attributes of a visit automatically, which allows us to write these
computations in arbitrary order as separate code fragments
(cite the AG manual - dont we have a paper or techreport
about this?).

\begin{figure}[htp]
\begin{center}
\begin{tikzpicture}
  [ visit/.style={circle, minimum size=14mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize}
  , svisit/.style={circle, minimum size=13mm, thick, draw=blue!40!black!30, top color=white, bottom color=blue!40!black!15,font=\scriptsize}
  , vbox/.style={rectangle, very thick, draw=black!100}
  , attr/.style={rectangle, very thick, draw=black!50, top color=white, bottom color=black!20,font=\footnotesize}
  , itf/.style={font=\large}
  , arr/.style={->,thick}
  , annot/.style={font=\footnotesize}
  , node distance=12mm
  ]

  \node[visit] (vGath) {vGath};
  \node[visit, right of=vGath, xshift=20mm] (vSpread) {vSpread};
  \node[attr, below of=vGath] (gathEnv) {|gathEnv :: Env|};
  \node[attr, above of=vSpread] (finEnv) {|finEnv :: Env|};
  \node[itf,left of=vGath, xshift=-3mm](name){|Name:|};
  
  \draw[arr] (vSpread) to (vGath);
  
  
  \node[itf,below of=name,yshift=-22mm](main){|Main:|};
  \node[visit,right of=main,xshift=3mm] (vName) {vName};
  \node[svisit,right of=vName,xshift=6mm,yshift=4mm] (vSupply) {vSupply};
  \node[svisit,right of=vSupply,xshift=5mm] (vOutcome) {vOutcome};
  \node[vbox, fit=(vSupply) (vOutcome)](nmbox) {};
  \node[visit, right of=vOutcome,xshift=6mm,yshift=-4mm] (vType) {vType};
  \node[attr, below of=vName] (nmTree) {nmTree :: Tree Name};
  \node[annot,above of=nmbox,yshift=-2mm]{nmTree.NameUse};
  \node[attr, below of=vType]{|ty :: Ty|};
  
  \draw[arr] (vSupply) to (vName);
  \draw[arr] (vType) to (vOutcome);
  \draw[arr] (vType) to [out=-140,in=-30] (vName);
  %% \draw[arr] (vOutcome) to (vSupply);
\end{tikzpicture}
\end{center}
\caption{Interface for Name AST and partial interface for Main AST.}
\label{fig.example.interface}
\end{figure}

An important ingredient of FCV is the concept of an \emph{interface}.
An interface specifies how a tree is to be visited: what visits there
are, what attributes have to be computed prior to the visit, and
what attributes are computed after the visit, and dependencies between
visits. These dependencies allow us to statically enforce that
dependent visits take place before the visit that depends on them, and
that only the attributes of previous visits are in scope.

For example, Figure~\ref{fig.example.interface} shows two interface
specifications: one for the Name AST and a partial one for the
Main AST. The |Name| interface specifies two visits, vGath and vSpread.
The arrow between vGath and vSpread expresses that the invoker of visits
on the Name AST must invoke vGath before vSpread. The boxes above and under
visits declare attributes. Attributes under a visit are \emph{synthesized}
attributes and will be computed by the visit. Attributes above a visit
are \emph{inherited} and must be computed by the invoker prior to the
visit. Hence, invoking vGath results in |gathEnv| to be computed, and
this value may then be used to for example computed the |finEnv|
attribute prior to calling vSpread.

\begin{figure}[htp]
\begin{center}
\begin{tikzpicture}
  [ nd/.style={circle, minimum size=10mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize}
  , attr/.style={rectangle, minimum size=2mm, node distance=3mm, very thick, draw=black!50, top color=white, bottom color=black!20,font=\footnotesize}
  , annot/.style={font=\footnotesize}
  , inh/.style={attr}
  , syn/.style={attr}
  , leftfirst/.style={xshift=-6mm}
  , rightfirst/.style={xshift=6mm}
  , rightfirst2/.style={xshift=4mm}
  , arr/.style={->,dashed}
  , inst/.style={draw=black!50, dotted, thick}
  , commline/.style={draw=red!50!black!50, very thick}
  , annot/.style={font=\Large}
  , node distance = 15mm
  ]

  \node[nd](top){override}
    child {
      node[nd](bottom){original}
    };
  \node[annot,right of=top,xshift=12mm]{extend,filter};
  
  \node[attr, leftfirst, left of=top](inh1){};
  \node[attr, leftfirst, left of=bottom](inh2){};
  \node[attr, rightfirst, right of=top](syn1){};
  \node[attr, rightfirst, right of=bottom](syn2){};
  \draw[arr] (inh2) to (inh1);
  \draw[arr] (syn1) to (syn2);

  \node[nd, below of=bottom, xshift=20mm](root){ast};
  \node[attr,rightfirst,right of=root](rsyn1){};
  \node[attr,right of=rsyn1](rsyn2){};
  
  \node[nd, below of=root, xshift=-17mm, yshift=-15mm](p1){|p1|}
    child {
      node[nd, yshift=-15mm](p2){|p2|}
    };
  \node[attr, leftfirst, left of=p1](p1i1){};
  \node[attr, leftfirst, left of=p2](p2i2){};
  \node[attr, rightfirst, right of=p1](p1s1){};
  \node[attr, rightfirst, right of=p2](p2s2){};
  \draw[arr] (p2i2) to (p1i1);
  \draw[arr] (p1s1) to (p2s2);
  \node[inst, fit=(p1i1)(p2s2)(p1)(p2)](box1){};
  
  \node[annot, left of=root, xshift=-10mm]{context};
  
  \node[nd, below of=root, xshift=27mm](a1){|a1|}
    child {
      node[nd](a2){|a2|}
        child {
          node[nd](a3){|a3|}
            child {
              node[nd](a4){|a4|}
            }
            child {
              node[nd](a5){|a5|}
            }
        }
    }
    child {
      node[nd](a6){|a6|}
    };
  \node[attr,rightfirst2,right of=a1](a1s){};
  \node[attr,rightfirst2,right of=a2](a2s){};
  \node[attr,rightfirst2,right of=a3](a3s){};
  \node[attr,rightfirst2,right of=a4](a4s){};
  \node[attr,rightfirst2,right of=a5](a5s){};
  \node[attr,rightfirst2,right of=a6](a6s){};
  \node[inst,fit=(a1)(a1s)(a6s)(a4)](box2){};
  
  \draw[arr] (a1s) to (a2s);
  \draw[arr] (a1s) to (a6s);
  \draw[arr] (a2s) to (a3s);
  \draw[arr] (a3s) to (a4s);
  \draw[arr] (a3s) to (a5s);
  
  \draw[inst] (box1) to (rsyn1);
  \draw[inst] (box2) to (rsyn2);
  
  \draw[commline] (p1) to [out=-20,in=-160] node[annot,auto]{|<-|} node[annot,auto,swap]{|->|} (a2);
  \draw[commline] (p2) to [out=-20,in=-160] node[annot,auto]{|<-|} node[annot,auto,swap]{|->|} (a4);

\end{tikzpicture}
\end{center}
\caption{Overriding and context via a parallel tree.}
\label{fig.example.overriding}
\end{figure}

The interface |Main| declares a visit vName, with a synthesized attribute
|nmTree|, which is a generate Name tree. This tree contain callback
nodes (corresponding to second red line in Figure~\ref{fig.example.instantiation})
described by the interface |NameUse| (not shown). The visits on these
nodes need not be performed by the invoker directly, but may be
delegated. However, in addition to restrictions imposed by the
|NameUse| interface, the invoker must also ensure that the visits of vSupply and
vOutcome in nodes of |nmTree| take place before it invokes vTree.

Another important aspect of FCV is the ability to extend trees with
addition nodes, either in the context of another node (the callback nodes),
or in isolation, but in both cases to adapt the algorithm.
Figure~\ref{fig.example.overriding} shows two typical scenarios.
In the first scenario, we generate an additional node that passes down
the value of an inherited attribute, and passes up a value of a synthesized
attribute, possibly changing the value in the process.

The second scenario shows how to deal with extensions that require additional
attributes. For example, we generated the a-tree, which has a visit that computes
some synthesized attribute (the box to the right). We inserted additional nodes
|a2| and |a4|. Suppose that the computations for |a2| and |a4| require an additional
inherited attribute (for example, some environment), and produces an additional attribute
(for example, error messages). We wish to keep the code for the other nodes of the |a|-tree
the same (black box), so instead we generate another tree, the |p|-tree, on which we define
the required attributes, and use callback nodes to exchange the attribute values between
|a|-nodes and |p|-nodes.

In the remainder of the paper...


%if False
Solution.
initial situation:
define a number of smart constructors that build a semantic tree (algorithm)
produce a tree using the smart constructors
visit the tree to run the algorithm and decorate the nodes
to access decorations of the tree, inspect decorations at the root

new situation:
produce a decoration representing a view that is build from smart constructors
attach the decoration to the tree as additional child to a node, such that it
is visited too
define some smart constructors in the context of a semantic node
the algorithm of this smart constructor has access to both the
decorations of its originating node as well as the decorations of
its own parent and children in the semantic-tree

In this construction the view part of the AST, there is not a clear distinction between
the two. We therefore distinguish between external nodes and the originating node
that defined it.

To accomplish the following tasks:
* extend algorithm with new cases: add extra smart constructors, produce the
  AST with these extra nodes
* override original behavior: factor out the overrideable behavior to a
  child node and write smart constructors for the intended behavior
* decorations of the view insufficient: create external nodes, ensure that
  the originating node has this context (possibly through yet another view)
  example: checking type signatures in expressions. Requires kind environment.
  Suppose that the type-checking view does not have this environment.
  Create the type-checking node as external node and ensure that the originating
  node has the kind environment.

We will now make these ideas more concrete.
Assume the following situation.
We consider traversal algorithms from a visitor/attribute grammar perspective.
Each node runs an algorithm that decorates the node with attributes.
Each node has inherited and synthesized attributes.
Inherited attributes are defined by the parent node. Synthesized attributes are
defined by the algorithm running at the node. The attributes form an interface
to pass data between a node and its parent.
The algorithms are executing during on or more topdown traversals of the AST.

Componentize as follows.
Interfaces. Describes visits to the trees and the decoration of them. Semantic trees.
Components: a smart constructor producing a semantic tree. Parameterized with the semantic trees that represents
children. The algorithm for a smart constructor can either be given in isolation, or in the context of a node. In the
latter case, the node constructed by the smart constructor is an external node, which has access to its context in the
other tree.
Visit the AST, decorate it. Some decorations may be semantic trees themselves, which we call views,
produced through the smart constructors. These semantic trees can be attached to the three as children,
and visited as part of the tree. Visits to external nodes of the tree cause data to be exchanged with
the context that created the external node.

We present a running example in Section xxx.
Suppose we implement a simple ML-style type inferencer.
Two forms of inferencing: name analysis, and type and kind inferencing.
From the relative complex AST of the full program, we generate two name-analysis view: one for type
constructors, and one for data constructors and functions. These name analysis-views are a lot less
complex, as they contain only definition, usage and scope nodes.
At definition nodes in the AST, we produce a (fresh) type or kind variable for each definition of a
function, variable, type constructor. The name analysis then provides us at the use nodes in the
AST either error messages or the the defined type.
Then we consider type-inferencing itself. We define a standard ML-style polymorphic type inference
algorithm using the typical con/var/app/lam/fix/gen/inst nodes. The outcome of type inferencing are
errors and a substitution that defines a type for the variables introduced earlier.
This type inferencing view does not have to concern itself
with a type environment: name analysis already provided these and we transfer them to the type inferencing
view through external nodes.
The original AST is more complex; it has
for example recursive lets, etc. From this AST we derive this easier view on it. The original AST also
has type signatures, which we define as extra constructors. To check a type signature, we also have to
check if it has a valid kind. For that, we use kind inference. For kind inference, we also reuse the
standard ML-style inference algorithm. For this, we derive this view from data type declarations and
type expressions, including the type expressions of type signatures.
We thus reuse name analysis, and the ML-style type inferencer. We also extended this standard inferencer
for the type inferencer, and kind analysis. We factored out name analysis from type inference.
We thus managed to split the algorithm up in separate parts that we can describe separately and reuse.
The other end of the bill is that we have to put some effort in generating the ASTs. Also, since decorations
become available during visits, visits require other visits to have been taken place. We show in Section xxx
that we can encode this intuitively as order-constraints on the interfaces and analyze our specification to
verify that the order is obeyed.
The idea is that we define for each node which attributes are computed in which visit and impose a partial
order on visits. We annotate the type of attributes with 

The visits on external nodes are part of this order. These external v

Essential observation: this pays off when the simplicity of the view (or reuse of an
algorithm for a view) weights up against the additional effort to generate the view and
exchange its data. Experiences with the implementation of the compilers for the EH family
of languages shows is that it pays off to implement an analysis on a simpler version of
the AST first and then move on to more complex versions that take more language features
into account. In this approach, we are hampered by the inability to abstract from
ASTs, which the approach presented in this paper solves.
Another observation is the connection with constraint-based analysis. Many algorithms
are based upon traversing the tree and generating a set of constraints to be solved later.
The pattern we present here can be compared to generating constraints that are not
flat but have a tree structure, and provide a mechanism to easily transfer results from
one analysis to another. Helium compiler as example. The approach presented in this
paper presents a nicer solution to problems that others have solved in ad-hoc ways.

Limitations.
Sometimes original attributes need to be replaced with more complicated data types.
We do not cater for that: shuffle tool in UHC, type classes, parametrization.
We only consider traversals.
There is some solution to this however.
Parameterize traversals over primitive operations.
Like with any component system: don't overengineer.

Contributions.
To be written later.
%endif

%if False
\section{Introduction}

  \begin{figure}[htb]
  \begin{tikzpicture}
    [data/.style={circle,draw=blue!50,fill=blue!20,thick,inner sep=0pt,minimum size=12mm}]
    
  \node at (0,0)        [data]   (ast)          {$\mathsf{ast}$};
  \node at (0,-2cm)     [data]   (ana-asti)     {$\mathsf{ast}_i$};
  \node at (2cm,-3cm)   [data]   (absi1)        {$\mathsf{abs}_{i,1}$};
  \node at (4.8cm,-4cm) [data]   (abs)          {$\mathsf{abs}_{\overline{i},j}$};
  \node at (6.9cm,-5cm) [data]   (res)          {$\mathsf{res}_{\overline{i},j}$};
  \node at (4.8cm,-6cm) [data]   (ana-abs)      {$\mathsf{abs}_{\overline{i},j+1}$};
  \node at (2cm,-7cm)   [data]   (ana-absim)    {$\mathsf{abs}_{i,m}$};
  \node at (0,-8cm)     [data]   (ana-asti1)    {$\mathsf{ast}_{i+1}$};
  \node at (0,-10cm)    [data]   (ana-astn)     {$\mathsf{ast}_n$};

  \draw[->, dashed]  (ast)        to node[auto]{analyze*}             (ana-asti);
  \draw[->]          (ana-asti)   to node[auto, yshift=1mm, xshift=-2mm]{abstract}             (absi1);
  \draw[->, dashed]  (absi1)      to node[auto, yshift=1mm, xshift=-2mm]{(analyze,abstract)*}  (abs);
  \draw[->]          (abs)        to node[auto]{algorithm}            (res);
  \draw[->]          (res)        to node[auto]{interpret}            (ana-abs);
  \draw[->, dashed]  (ana-abs)    to node[auto, yshift=-1mm, xshift=-2mm]{(merge,analyze)*}     (ana-absim);
  \draw[->]          (ana-absim)  to node[auto, yshift=-1mm, xshift=-2mm]{merge}                (ana-asti1);
  \draw[->, dashed]  (ana-asti1)  to node[auto]{analyze*}             (ana-astn);
  
  \draw[->, dotted]  (ana-asti)   to (ana-asti1);
  \draw[->, dotted]  (absi1)      to (ana-absim);
  \draw[->, dotted]  (abs)        to (ana-abs);
  \end{tikzpicture}
  \caption{Data-flow diagram of an orchestration of analyses.}
  \label{fig.analysis.dataflow}
  \end{figure}

  Compilation can be modeled as a pipeline of tree transformations~\cite{DBLP:conf/haskell/DijkstraFS09}.
  Each transformation in the pipeline is preceded by an analysis that traverses the AST and decorates it
  with analysis results.
  Typically, the analysis is not performed on the actual AST itself, but on a generated
  \emph{abstraction} of it, in the form of a desugared tree, a control-flow graph, or a collection of
  constraints.
  
  As an example, consider a constraint-based analysis. It consists of a traversal that generates
  constraints, and an algorithm that takes the constraints, and solves them.
  A solver for a particular constraint language may be reused
  to solve constraints that can be mapped to constraints in this language.
  This approach has two problems: tweaking of algorithm and exchange of data.
  
  Our goal is to take this approach one step further.
  We compose an analysis out of a library of reusable tree traversals.

   Our goal is to compose an analysis out
  of a library of reusable tree traversals.
  To achieve this goal, two challenges have to be overcome:
  
  Each traversal either applies a known algorithm on the tree to decorate it with analysis results, or
  produces an \emph{abstraction} of the tree and analyzes that (recursively) instead.
  Figure~\ref{fig.analysis.dataflow} illustrates this situation.
  
  For example, name analysis can be reduced to an analysis over a much simpler tree denoting only
  scoping, binding sites and use sites. Damas-Milner type inference can be reduced to solving a
  set of equality, instantiation, and generalization constraints~\cite{citeulike:352561}.
  The analysis on the abstracted tree is easier or can be reused for other analyses that reduce to the
  same abstraction.
  Constraint-based analysis typically exhibit this approach. 

  such as a desugared tree, a control-flow graph, or a collection of constraints.
%endif


%if False
\section{Repmin with Imperative Visitors}

%format . = "."
%format abstract = "\mathbf{abstract}"
%format class = "\mathbf{class}"
%format void = "\mathbf{void}"
%format extends = "\mathbf{extends}"
%format implements = "\mathbf{implements}"
%format interface = "\mathbf{interface}"
%format for = "\mathbf{for}"
%format return = "\mathbf{return}"
%format (Generic t1 t2) = t1 "\langle"  t2 "\rangle"
%format := = ":="
%format var = "\mathbf{var}"

\begin{figure}[htb]
\begin{code}
abstract class Tree
  abstract void accept(TreeVisitor v)

class Node extends Tree
  int                  value
  (Generic List Tree)  subtrees

  void accept(TreeVisitor v)
    v.visit(this)

interface TreeVisitor
  void visit(Node n)

abstract class (Generic MyList T)
  abstract void accept(ListVisitor v)

class Cons extends (Generic MyList T)
  T                    ref
  (Generic MyList T)   next
  
  void accept((Generic ListVisitor T) v)
    v.visit(this)

class Nil extends (Generic MyList T)
  void accept((Generic ListVisitor T) v)
    v.visit(this)

interface (Generic ListVisitor T)
  void visit((Generic Cons T) c)
  void visit((Generic Nil T) n)
\end{code}
\caption{Visitors for a tree and list}
\label{fig.example.visitors.imperative}
\end{figure}

\begin{figure}[htb]
\begin{code}
class GenListVisitor implements TreeVisitor
  (Generic MyList Node) list
  
  GenListVisitor()
    list = new Nil()
  
  void visit(Node n)
    var c   := new Cons()
    c.ref   := this
    c.next  := list
    list    := c
    
    for (Node m : n.subtrees)
      m.accept(this)

class SumVisitor implements (Generic ListVisitor Node)
  int gathSum

  SumVisitor()
    gathSum := 0
  
  visit(Nil n)
    return

  visit(Cons c)
    gathSum := ref.value + gathSum
    c.next.accept(this)

class UpdateVisitor implements (Generic ListVisitor Node)
  int sum

  UpdateVisitor(int finalSum)
    sum = finalSum
  
  visit(Nil n)
    return

  visit(Cons c)
    ref.value := sum
    c.next.accept(this)

void main(String[] args)
  var ast   := parse (args[0])
  var gen   := new GenListVisitor()
  ast.accept(gen)
  var addr  := new SumVisitor()
  gen.list.accept(aggregator)
  var upd   := new UpdateVisitor(addr.gathSum)
  gen.list.accept(upd)
\end{code}
\caption{Imperative Repmin.}
\label{fig.example.repmin.imperative}
\end{figure}
%endif

\acks

Acknowledgments, if needed.

\bibliographystyle{abbrvnat}
\bibliography{references}

%if False

\appendix

\section{Imperative Visitor Code}

\begin{code}
{-# LANGUAGE TypeFamilies #-}
module Example where

import Data.IORef
deref = readIORef
assign = writeIORef

type family Syn a
type family Inh a
data InOut a
  = InOut { ast :: a, inh :: Inh a, syn :: Syn a}


data List
  = List { ioList :: InOut List, altList :: ListAlt }
data ListAlt
  = Cons ConsLoc List
  | Nil NilLoc

data ListInh = ListInh  { finSumRef :: IORef Int }
data ListSyn = ListSyn  { gathSumRef :: IORef Int }
data ConsLoc = ConsLoc  { leafRef :: IORef LeafLoc }
data NilLoc  = NilLoc   {}

type instance Inh List = ListInh
type instance Syn List = ListSyn

type ListVisitor
  =  ( InOut List -> NilLoc -> IO ()
     , InOut List -> ConsLoc -> InOut List -> IO () )

acceptList :: List -> ListVisitor -> IO ()
acceptList (List io (Nil loc)) (nil, _) = nil io loc
acceptList (List io (Cons loc (List io' _))) (_, con)
  = con io loc io'

single :: LeafLoc -> IO List
single loc
  = do  [r1,r2,r3,r4,r5] <- sequence $
          replicate 5 (newIORef undefined)
        assign r3 loc
        return  $ mk r1 r2
                $ Cons (ConsLoc r3)
                $ mk r4 r5 $ Nil NilLoc
  where  mk r1 r2 tl = let l = List (io l r1 r2) tl in l
         io l r1 r2 = InOut l (ListInh r1) (ListSyn r2)

conc :: List -> List -> List
conc l1 l2
  = foldr  (\(io,loc) r -> List io (Cons loc r))
           (nil l1) (toList l1 ++ toList l2)
  where
    toList (List _ (Nil _)) = []
    toList (List io (Cons loc r)) = (io,loc) : toList r
  
    nil (List _ (Cons _ r)) = nil r
    nil l = l

shwList :: List -> IO String
shwList (List _ (Nil _)) = return "nil"
shwList (List _ (Cons loc r))
  = do leaf  <- deref (leafRef loc)
       val   <- deref (valRef leaf)
       s     <- shwList r
       return ((show val) ++ " : " ++ s)


data Tree
  = Tree { ioTree :: InOut Tree, altTree :: TreeAlt }
data TreeAlt
  = Leaf LeafLoc
  | Bin BinLoc Tree Tree

data TreeInh = TreeInh  {}
data TreeSyn = TreeSyn  { gathListRef :: IORef List }
data BinLoc  = BinLoc   {}
data LeafLoc = LeafLoc  { valRef :: IORef Int }

type instance Inh Tree = TreeInh
type instance Syn Tree = TreeSyn

type TreeVisitor
  =  (  InOut Tree -> LeafLoc -> IO ()
     ,  InOut Tree -> BinLoc ->
        InOut Tree -> InOut Tree -> IO () )

acceptTree :: Tree -> TreeVisitor -> IO ()
acceptTree (Tree io (Leaf loc)) (leaf, _) = leaf io loc
acceptTree (Tree io (Bin loc  (Tree iol _)
                              (Tree ior _))) (_, bin)
  = bin io loc iol ior

leafM :: Int -> IO Tree
leafM n
  = do  r1  <- newIORef n
        r2  <- newIORef undefined
        let t =  Tree (InOut t TreeInh (TreeSyn r2))
                   (Leaf $ LeafLoc r1)
        return t

nodeM :: IO Tree -> IO Tree -> IO Tree
nodeM ioL ioR
  = do  left   <- ioL
        right  <- ioR
        ref    <- newIORef undefined
        let t =  Tree (InOut t TreeInh (TreeSyn ref))
                   (Bin BinLoc left right)
        return t

shwTree :: Tree -> IO String
shwTree (Tree _ (Leaf loc))
  = do  v <- deref (valRef loc)
        return $ "(leaf " ++ show v ++ ")"
shwTree (Tree _ (Bin _ left right))
  = do  l <- shwTree left
        r <- shwTree right
        return $ "(bin " ++ l ++ " " ++ r ++ ")"


synListVisitor :: TreeVisitor
synListVisitor
  = (  \lhs loc -> single loc >>= assign
         (gathListRef $ syn lhs)
    ,  \lhs loc left right ->
         do  acceptTree (ast left) synListVisitor
             l <- deref (gathListRef $ syn left)
             acceptTree (ast right) synListVisitor
             r <- deref (gathListRef $ syn right)
             assign (gathListRef $ syn $ lhs)
               (conc l r) )

sumVisitor :: ListVisitor
sumVisitor
  = ( \lhs loc -> assign (gathSumRef $ syn lhs) 0
    , \lhs loc tl ->
        do  acceptList (ast tl) sumVisitor
            tlSum  <- deref (gathSumRef $ syn tl)
            leaf   <- deref (leafRef loc)
            val    <- deref (valRef leaf)
            assign (gathSumRef $ syn lhs)
              (tlSum + val)
    )

backVisitor :: ListVisitor
backVisitor
  = ( \lhs loc -> return ()
    , \lhs loc tl ->
        do fin <- deref (finSumRef $ inh lhs)
           leaf <- deref (leafRef loc)
           assign (valRef leaf) fin
           assign (finSumRef $ inh tl) fin
           acceptList (ast tl) backVisitor
    )


main :: IO ()
main
  = do  t <- nodeM  (nodeM (leafM 1) (leafM 2))
                    (leafM 3)
        acceptTree t synListVisitor
        l <- deref (gathListRef $ syn $ ioTree t)
        shwList l >>= putStrLn
        acceptList l sumVisitor
        sum <- deref (gathSumRef $ syn $ ioList l)
        assign (finSumRef $ inh $ ioList l) sum
        acceptList l backVisitor
        shwTree t >>= putStrLn
\end{code}

%endif

\end{document}