\documentclass[preprint,authoryear]{sigplanconf} %include lhs2TeX.fmt %include polycode.fmt \usepackage{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsthm} \usepackage{stmaryrd} \usepackage{float} \usepackage{xcolor} \usepackage{txfonts} \usepackage{natbib} \usepackage[pdftex]{graphicx} \usepackage{tikz} \usetikzlibrary{trees,arrows,fit,decorations} \floatstyle{boxed} \restylefloat{figure} %format p1 %format p2 %format a1 %format a2 %format a3 %format a4 %format a5 %format a6 \begin{document} \conferenceinfo{ICFP '10}{Sept 27, Baltimore.} \copyrightyear{2010} \copyrightdata{[to be supplied]} \titlebanner{banner above paper title} % These are ignored unless \preprintfooter{short description of paper} % 'preprint' option specified. \title{Functional Visitors with Callback for Reusable Traversals} \authorinfo{Arie Middelkoop \and Atze Dijkstra \and S. Doaitse Swierstra} {Universiteit Utrecht} {\{ariem,atze,doaitse\}@@cs.uu.nl} \maketitle \begin{abstract} A compiler consists of a diversity of algorithms that \emph{traverse} an Abstract Syntax Tree (AST) and \emph{decorate} the nodes with computed properties. These algorithms are usually monolithically defined in terms of the the AST, which impairs modularity and reuse. Our goal is to factor up these algorithms into simpler, more abstract, and reusable components. We write an otherwise monolithic algorithm on AST M, as a three-step algorithm, that (1) generates a more abstract AST V, (2) applies a reusable traversal that decorates V, and finally (3) translate the results of V back to M. During V's construction in step (1), we add extra nodes that extend the algorithm of step (2), or are place holders for data exchange between M and V to facilitate step (3). We present a functional domain specific language based on visitors in which we intuitively express such algorithms. The absence of side effect allows us to enforce a well-defined evaluation order. \end{abstract} \category{CR-number}{subcategory}{third-level} \terms term1, term2 \keywords keyword1, keyword2 \section{Introduction} Compilation can be modeled as a pipeline of tree transformations~\cite{DBLP:conf/haskell/DijkstraFS09}. Each transformation in the pipeline is preceded by an analysis that traverses the AST and decorates it with analysis results. Many of these algorithms resemble each other. For example, name analysis of data types declarations resembles name analysis of expressions, and type checking for expressions resembles kind checking of types. This commonality is exploited in compiler implementations. Common infrastructure, such as libraries for environments and substitutions, are factored out as separate and reusable components. However, factoring out the actual traversals for reuse is a difficult task: the ASTs corresponding to data type declarations and expressions are different, as well as the ASTs of expressions and types. Generic traversals are not powerful enough to bridge these differences, as they are not only present in the structure of the AST, but also in behavior for common structure. For example, we typically allow more polymorphism for type analysis compared to kind analysis. The lack of modularity of traversals is an engineering deficiency that limits isolation of concerns, impairs reuse, and affects maintenance due to code duplication. Our experiences with the Utrecht Haskell Compiler (UHC) show these problems. The UHC is implemented incrementally as a stack of language extensions. Its implementation starts with a compiler for a simple lambda calculus, and ends with Haskell plus some experimental extensions. Typically, each language extension gives rise to additional syntax. The behavior for this syntax typically has aspects in common with behavior for syntax already supported by the implementation. For example, a lot of syntax deals with name binding and scoping, the split of environments over subexpressions, or the combination of types of subexpressions. We wish to abstract from these patterns, as these now lead to a duplication of code. %% The implementation of UHC can be used for rapid prototyping and experimentation. %% We can take any substack of language features that suits our needs best as basis for %% further programming. %% However, to be able to reuse the code, the AST we are interested in %% must be a structural extension of the UHC AST. %% We ignore some other forms of code duplication: %% namely the need to override existing code. %% Too some extend the problem cannot be avoided. Crosscutting language features. %% However, we would like this cross-cutting to be described individually. \begin{figure}[htp] \begin{center} \begin{tikzpicture} [ nd/.style={circle, minimum size=10mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize} , attr/.style={rectangle, minimum size=2mm, node distance=3mm, very thick, draw=black!50, top color=white, bottom color=black!20,font=\footnotesize} , annot/.style={font=\footnotesize} , inh/.style={attr} , syn/.style={attr} , leftfirst/.style={xshift=-4mm} , rightfirst/.style={xshift=4mm} , arr/.style={->,dashed} , inst/.style={draw=black!50, dotted, thick} , commline/.style={draw=red!50!black!50, very thick} ] \node[nd](sc){scope} child { [sibling distance=20mm] node[nd](spl){split} child { node[nd](use){use} } child { node[nd](def){def} } }; \node[nd, left of=sc, xshift=-27mm](let){let} child { node[nd](lam){lam} child { node[nd](pat){pat} } child { node[nd](var){var} } } child { node{$\ldots$} }; \node[syn,rightfirst](letsyn)[right of=let]{}; \node[syn,rightfirst](lamsyn)[right of=lam]{}; \node[syn,rightfirst](varsyn)[right of=var]{}; \node[syn,rightfirst](patsyn)[right of=pat]{}; \node[syn,rightfirst](scsyn)[right of=sc]{}; \node[inh,leftfirst](scinh)[left of=sc]{}; \node[syn,rightfirst](splsyn)[right of=spl]{}; \node[inh,leftfirst](splinh)[left of=spl]{}; \node[syn,rightfirst](defsyn)[right of=def]{}; \node[inh,leftfirst](definh)[left of=def]{}; \node[syn,rightfirst](usesyn)[right of=use]{}; \node[inh,leftfirst](useinh)[left of=use]{}; \draw[arr] (letsyn) to (lamsyn); \draw[arr] (lamsyn) to (varsyn); \draw[arr] (lamsyn) to (patsyn); \draw[arr] (splinh) to (scinh); \draw[arr] (useinh) to (splinh); \draw[arr] (definh) to (splinh); \draw[arr] (scsyn) to (splsyn); \draw[arr] (splsyn) to (defsyn); \draw[arr] (splsyn) to (usesyn); \draw[arr] (scinh) to [out=90,in=90] (scsyn); \node[inst,fit=(spl) (sc) (use) (def) (useinh) (defsyn)](instbox){}; \draw[inst] (letsyn) to (instbox); \draw[commline] (sc) to [out=120,in=60] node[annot,auto]{|<- [Ty]|} (let); \draw[commline] (use) to [out=-120,in=-60] node[annot,auto]{|<- Ty,Errs|} node[annot,auto,swap]{|Nm ->|} (var); \draw[commline] (def) to [out=-110,in=-70] node[annot,auto]{|<- Errs|} node[annot,auto,swap]{|Nm,Ty ->|} (pat); \end{tikzpicture} \end{center} \caption{Instantiation of the name-analysis component.} \label{fig.example.instantiation} \end{figure} We propose to encode traversal algorithm components as follows. Each component defines its own suitable AST structure and decorations, as well as an algorithm for each type of node that compute the decorations in one or more visits. For example, consider the following name analysis component. Its AST consists of use-, def-, scope-, and split-nodes. Decorations include environments that transport name information. A def-node introduces a value bound to a given name, at a use-node we lookup this value. Additionally, we decorate a def-node with possible duplication-errors, and a use-node with undefined-errors if there is no name bound to it. The algorithms are invoked twice per node: in the first visit, an environment is computed based on the computed environments of the children, which holds all bindings defined in the subtree. At the root, this environment contains all bindings. At the second visit, we copy this environment down, such that it can be inspected to determine errors and lookup bound values. The algorithm formulated formulated as such is abstract (works for any value), and defined on a lean AST. To instantiate such a component, consider a type analysis of some (lambda) expression AST (Expr AST). A part of type analysis is a name analysis of the Expr AST. We define an algorithm that delegates this work to the name analysis component. Figure~\ref{fig.example.instantiation} depicts the situation. We take the following steps: \begin{enumerate} \item We define a traversal over the Expr AST to decorate the nodes with a Name AST. These decorations, \emph{attributes}, are visualized as the small boxes next to the nodes. Dashed arrows between one node and others, indicates that the former is a function of the values of the latter. \item In case of the use-, def-, and scope-nodes, we actually generate a special version of these nodes. We call these \emph{callback nodes}. A callback node contains a reference back to the expr-node (the red line in the picture), and an extension of the algorithm of the name-node. The extended algorithm uses the reference back to receive, for example, a name, and supplies back the type introduced for this name elsewhere or errors. \item The computed Name AST is made a child of the let-node and visited according to the algorithm for the name analysis. The nodes are decorated with an environment (the attribute to the right in the picture) containing all bindings of the subtree. It is then used to define the environment for the lookups (the attributes to the left), which is subsequently passed down. \item Visits to the callback nodes demand or supply data from and to the related node in the Expr AST, and thus cause additional visits in the Expr AST. For example, for the def-node in Figure~\ref{fig.example.instantiation}, the lam-node must provide a name and a type. Later, when the def-node is visited again, it supplies potential errors back to the lam-node. \end{enumerate} \begin{figure}[htp] \begin{center} \begin{tikzpicture} [ comp/.style={rectangle, minimum size=13mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize} , gen/.style={->>,thick} , ext/.style={->,thick} , annot/.style={font=\footnotesize} , node distance=25mm ] \node[comp] (tpsig) {type + sig}; \node[comp, left of=tpsig] (main) {main}; \node[comp, above of=tpsig] (name) {name}; \node[comp, right of=tpsig] (kind) {kind}; \node[comp, below of=tpsig] (type) {type}; \node[comp, right of=type] (tpkind) {type+kind}; \draw[gen] (main) to node[annot,auto]{gen} (name); \draw[gen] (main) to node[annot,auto]{gen} (tpsig); \draw[gen] (tpsig) to node[annot,auto]{gen} (kind); \draw[gen] (kind) to node[annot,auto]{gen} (name); \draw[gen] (kind) to node[annot,auto]{gen} (tpkind); \draw[ext] (tpsig) to node[annot,auto]{extends} (type); \draw[ext] (tpkind) to node[annot,auto]{extends} (type); \end{tikzpicture} \end{center} \caption{Type inferencer components.} \label{fig.example.components} \end{figure} In Section~\ref{nextsection} we show how to encode the above example in Haskell using pure functions to compute attributes, and side-effect to access attributes of one node from another. We consider both the name analysis component mentioned in this section, and a Damas-Milner type analysis component. Figure~\ref{fig.example.components} illustrates. The type analysis component uses an AST similar to the Damas-Milner type rules, most notably a non-recursive let, and explicit nodes for instantiation and generalization. The type analysis algorithm does not have to deal with name analysis anymore, which makes it a bit simpler. Furthermore, we define two extensions of this type inference: kind inference, and support for type signature (uses kind inference). Finally, we combine all these components together to form an algorithm for the main AST, which has nodes for recursive let and type signatures. This approach is clearly a trade-off. On the one hand, the algorithm is simpler because we define the algorithms and extensions in isolation. It is therefore also easier to maintain, explain and reuse. On the other hand, the approach poses the following challenges: \begin{enumerate} \item The generation of another AST is additional work and may make the algorithm more verbose. We argue that the ASTs of most components will be simple views on the AST, and therefore easy to produce. Aside from that, transforming ASTs is the core business of a compiler and therefore well understood. In a similar line of thought, the extra traversals incur runtime overhead. In experience with traversal optimizations of the UHC, we discovered that compared to the actual work a compiler is performing (unification, constraint solving, etc.), the time spend in traversing the AST is negligible. Furthermore, advances in tree-fusion techniques may remove the overhead completely, and traversals on the generated tree are also a good candidate for parallelism. \item To ensure that our approach is intuitive, it must be clear what data is exchanged (i.e. the labels on the red line in the picture), and we must ensure that nodes are visited in the proper order, such that, for example, the visit that provides the error messages takes place before the visit that inspects them. In other words: our programs must be side-effect free. \item Furthermore, we typically need to \emph{extend} the algorithm of a component. To reuse the type inference algorithm, for example, we may have to add support for type signatures, higher-ranked types, etc. To obtain true modularity, we desire the ability to keep components as a black box, and add additional behavior where needed, without having to touch the code of the component itself. %% Todo: give a good reason for this %% Modularity, compositionality, etc.. \end{enumerate} To meet the latter two challenges, we introduce a functional language Functional Callback Visitors (FCV) that resembles attribute grammars, however with an explicit specification of traversals and visits (predecessor~\cite{middelkoop09wgt10}). As with attribute grammars, our specification describes the value of attributes as a function of the value of other attributes. We require the static guarantee that these computations are non-cyclic. The absence of both side effect and cyclic definitions make a FCV program predictable. Programming with attributes has the additional advantage that trivial attribute computations can omitted and inserted automatically, and that we can determine the relative order of computations for attributes of a visit automatically, which allows us to write these computations in arbitrary order as separate code fragments (cite the AG manual - dont we have a paper or techreport about this?). \begin{figure}[htp] \begin{center} \begin{tikzpicture} [ visit/.style={circle, minimum size=14mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize} , svisit/.style={circle, minimum size=13mm, thick, draw=blue!40!black!30, top color=white, bottom color=blue!40!black!15,font=\scriptsize} , vbox/.style={rectangle, very thick, draw=black!100} , attr/.style={rectangle, very thick, draw=black!50, top color=white, bottom color=black!20,font=\footnotesize} , itf/.style={font=\large} , arr/.style={->,thick} , annot/.style={font=\footnotesize} , node distance=12mm ] \node[visit] (vGath) {vGath}; \node[visit, right of=vGath, xshift=20mm] (vSpread) {vSpread}; \node[attr, below of=vGath] (gathEnv) {|gathEnv :: Env|}; \node[attr, above of=vSpread] (finEnv) {|finEnv :: Env|}; \node[itf,left of=vGath, xshift=-3mm](name){|Name:|}; \draw[arr] (vSpread) to (vGath); \node[itf,below of=name,yshift=-22mm](main){|Main:|}; \node[visit,right of=main,xshift=3mm] (vName) {vName}; \node[svisit,right of=vName,xshift=6mm,yshift=4mm] (vSupply) {vSupply}; \node[svisit,right of=vSupply,xshift=5mm] (vOutcome) {vOutcome}; \node[vbox, fit=(vSupply) (vOutcome)](nmbox) {}; \node[visit, right of=vOutcome,xshift=6mm,yshift=-4mm] (vType) {vType}; \node[attr, below of=vName] (nmTree) {nmTree :: Tree Name}; \node[annot,above of=nmbox,yshift=-2mm]{nmTree.NameUse}; \node[attr, below of=vType]{|ty :: Ty|}; \draw[arr] (vSupply) to (vName); \draw[arr] (vType) to (vOutcome); \draw[arr] (vType) to [out=-140,in=-30] (vName); %% \draw[arr] (vOutcome) to (vSupply); \end{tikzpicture} \end{center} \caption{Interface for Name AST and partial interface for Main AST.} \label{fig.example.interface} \end{figure} An important ingredient of FCV is the concept of an \emph{interface}. An interface specifies how a tree is to be visited: what visits there are, what attributes have to be computed prior to the visit, and what attributes are computed after the visit, and dependencies between visits. These dependencies allow us to statically enforce that dependent visits take place before the visit that depends on them, and that only the attributes of previous visits are in scope. For example, Figure~\ref{fig.example.interface} shows two interface specifications: one for the Name AST and a partial one for the Main AST. The |Name| interface specifies two visits, vGath and vSpread. The arrow between vGath and vSpread expresses that the invoker of visits on the Name AST must invoke vGath before vSpread. The boxes above and under visits declare attributes. Attributes under a visit are \emph{synthesized} attributes and will be computed by the visit. Attributes above a visit are \emph{inherited} and must be computed by the invoker prior to the visit. Hence, invoking vGath results in |gathEnv| to be computed, and this value may then be used to for example computed the |finEnv| attribute prior to calling vSpread. \begin{figure}[htp] \begin{center} \begin{tikzpicture} [ nd/.style={circle, minimum size=10mm, very thick, draw=blue!50!black!50, top color=white, bottom color=blue!50!black!20,font=\footnotesize} , attr/.style={rectangle, minimum size=2mm, node distance=3mm, very thick, draw=black!50, top color=white, bottom color=black!20,font=\footnotesize} , annot/.style={font=\footnotesize} , inh/.style={attr} , syn/.style={attr} , leftfirst/.style={xshift=-6mm} , rightfirst/.style={xshift=6mm} , rightfirst2/.style={xshift=4mm} , arr/.style={->,dashed} , inst/.style={draw=black!50, dotted, thick} , commline/.style={draw=red!50!black!50, very thick} , annot/.style={font=\Large} , node distance = 15mm ] \node[nd](top){override} child { node[nd](bottom){original} }; \node[annot,right of=top,xshift=12mm]{extend,filter}; \node[attr, leftfirst, left of=top](inh1){}; \node[attr, leftfirst, left of=bottom](inh2){}; \node[attr, rightfirst, right of=top](syn1){}; \node[attr, rightfirst, right of=bottom](syn2){}; \draw[arr] (inh2) to (inh1); \draw[arr] (syn1) to (syn2); \node[nd, below of=bottom, xshift=20mm](root){ast}; \node[attr,rightfirst,right of=root](rsyn1){}; \node[attr,right of=rsyn1](rsyn2){}; \node[nd, below of=root, xshift=-17mm, yshift=-15mm](p1){|p1|} child { node[nd, yshift=-15mm](p2){|p2|} }; \node[attr, leftfirst, left of=p1](p1i1){}; \node[attr, leftfirst, left of=p2](p2i2){}; \node[attr, rightfirst, right of=p1](p1s1){}; \node[attr, rightfirst, right of=p2](p2s2){}; \draw[arr] (p2i2) to (p1i1); \draw[arr] (p1s1) to (p2s2); \node[inst, fit=(p1i1)(p2s2)(p1)(p2)](box1){}; \node[annot, left of=root, xshift=-10mm]{context}; \node[nd, below of=root, xshift=27mm](a1){|a1|} child { node[nd](a2){|a2|} child { node[nd](a3){|a3|} child { node[nd](a4){|a4|} } child { node[nd](a5){|a5|} } } } child { node[nd](a6){|a6|} }; \node[attr,rightfirst2,right of=a1](a1s){}; \node[attr,rightfirst2,right of=a2](a2s){}; \node[attr,rightfirst2,right of=a3](a3s){}; \node[attr,rightfirst2,right of=a4](a4s){}; \node[attr,rightfirst2,right of=a5](a5s){}; \node[attr,rightfirst2,right of=a6](a6s){}; \node[inst,fit=(a1)(a1s)(a6s)(a4)](box2){}; \draw[arr] (a1s) to (a2s); \draw[arr] (a1s) to (a6s); \draw[arr] (a2s) to (a3s); \draw[arr] (a3s) to (a4s); \draw[arr] (a3s) to (a5s); \draw[inst] (box1) to (rsyn1); \draw[inst] (box2) to (rsyn2); \draw[commline] (p1) to [out=-20,in=-160] node[annot,auto]{|<-|} node[annot,auto,swap]{|->|} (a2); \draw[commline] (p2) to [out=-20,in=-160] node[annot,auto]{|<-|} node[annot,auto,swap]{|->|} (a4); \end{tikzpicture} \end{center} \caption{Overriding and context via a parallel tree.} \label{fig.example.overriding} \end{figure} The interface |Main| declares a visit vName, with a synthesized attribute |nmTree|, which is a generate Name tree. This tree contain callback nodes (corresponding to second red line in Figure~\ref{fig.example.instantiation}) described by the interface |NameUse| (not shown). The visits on these nodes need not be performed by the invoker directly, but may be delegated. However, in addition to restrictions imposed by the |NameUse| interface, the invoker must also ensure that the visits of vSupply and vOutcome in nodes of |nmTree| take place before it invokes vTree. Another important aspect of FCV is the ability to extend trees with addition nodes, either in the context of another node (the callback nodes), or in isolation, but in both cases to adapt the algorithm. Figure~\ref{fig.example.overriding} shows two typical scenarios. In the first scenario, we generate an additional node that passes down the value of an inherited attribute, and passes up a value of a synthesized attribute, possibly changing the value in the process. The second scenario shows how to deal with extensions that require additional attributes. For example, we generated the a-tree, which has a visit that computes some synthesized attribute (the box to the right). We inserted additional nodes |a2| and |a4|. Suppose that the computations for |a2| and |a4| require an additional inherited attribute (for example, some environment), and produces an additional attribute (for example, error messages). We wish to keep the code for the other nodes of the |a|-tree the same (black box), so instead we generate another tree, the |p|-tree, on which we define the required attributes, and use callback nodes to exchange the attribute values between |a|-nodes and |p|-nodes. In the remainder of the paper... %if False Solution. initial situation: define a number of smart constructors that build a semantic tree (algorithm) produce a tree using the smart constructors visit the tree to run the algorithm and decorate the nodes to access decorations of the tree, inspect decorations at the root new situation: produce a decoration representing a view that is build from smart constructors attach the decoration to the tree as additional child to a node, such that it is visited too define some smart constructors in the context of a semantic node the algorithm of this smart constructor has access to both the decorations of its originating node as well as the decorations of its own parent and children in the semantic-tree In this construction the view part of the AST, there is not a clear distinction between the two. We therefore distinguish between external nodes and the originating node that defined it. To accomplish the following tasks: * extend algorithm with new cases: add extra smart constructors, produce the AST with these extra nodes * override original behavior: factor out the overrideable behavior to a child node and write smart constructors for the intended behavior * decorations of the view insufficient: create external nodes, ensure that the originating node has this context (possibly through yet another view) example: checking type signatures in expressions. Requires kind environment. Suppose that the type-checking view does not have this environment. Create the type-checking node as external node and ensure that the originating node has the kind environment. We will now make these ideas more concrete. Assume the following situation. We consider traversal algorithms from a visitor/attribute grammar perspective. Each node runs an algorithm that decorates the node with attributes. Each node has inherited and synthesized attributes. Inherited attributes are defined by the parent node. Synthesized attributes are defined by the algorithm running at the node. The attributes form an interface to pass data between a node and its parent. The algorithms are executing during on or more topdown traversals of the AST. Componentize as follows. Interfaces. Describes visits to the trees and the decoration of them. Semantic trees. Components: a smart constructor producing a semantic tree. Parameterized with the semantic trees that represents children. The algorithm for a smart constructor can either be given in isolation, or in the context of a node. In the latter case, the node constructed by the smart constructor is an external node, which has access to its context in the other tree. Visit the AST, decorate it. Some decorations may be semantic trees themselves, which we call views, produced through the smart constructors. These semantic trees can be attached to the three as children, and visited as part of the tree. Visits to external nodes of the tree cause data to be exchanged with the context that created the external node. We present a running example in Section xxx. Suppose we implement a simple ML-style type inferencer. Two forms of inferencing: name analysis, and type and kind inferencing. From the relative complex AST of the full program, we generate two name-analysis view: one for type constructors, and one for data constructors and functions. These name analysis-views are a lot less complex, as they contain only definition, usage and scope nodes. At definition nodes in the AST, we produce a (fresh) type or kind variable for each definition of a function, variable, type constructor. The name analysis then provides us at the use nodes in the AST either error messages or the the defined type. Then we consider type-inferencing itself. We define a standard ML-style polymorphic type inference algorithm using the typical con/var/app/lam/fix/gen/inst nodes. The outcome of type inferencing are errors and a substitution that defines a type for the variables introduced earlier. This type inferencing view does not have to concern itself with a type environment: name analysis already provided these and we transfer them to the type inferencing view through external nodes. The original AST is more complex; it has for example recursive lets, etc. From this AST we derive this easier view on it. The original AST also has type signatures, which we define as extra constructors. To check a type signature, we also have to check if it has a valid kind. For that, we use kind inference. For kind inference, we also reuse the standard ML-style inference algorithm. For this, we derive this view from data type declarations and type expressions, including the type expressions of type signatures. We thus reuse name analysis, and the ML-style type inferencer. We also extended this standard inferencer for the type inferencer, and kind analysis. We factored out name analysis from type inference. We thus managed to split the algorithm up in separate parts that we can describe separately and reuse. The other end of the bill is that we have to put some effort in generating the ASTs. Also, since decorations become available during visits, visits require other visits to have been taken place. We show in Section xxx that we can encode this intuitively as order-constraints on the interfaces and analyze our specification to verify that the order is obeyed. The idea is that we define for each node which attributes are computed in which visit and impose a partial order on visits. We annotate the type of attributes with The visits on external nodes are part of this order. These external v Essential observation: this pays off when the simplicity of the view (or reuse of an algorithm for a view) weights up against the additional effort to generate the view and exchange its data. Experiences with the implementation of the compilers for the EH family of languages shows is that it pays off to implement an analysis on a simpler version of the AST first and then move on to more complex versions that take more language features into account. In this approach, we are hampered by the inability to abstract from ASTs, which the approach presented in this paper solves. Another observation is the connection with constraint-based analysis. Many algorithms are based upon traversing the tree and generating a set of constraints to be solved later. The pattern we present here can be compared to generating constraints that are not flat but have a tree structure, and provide a mechanism to easily transfer results from one analysis to another. Helium compiler as example. The approach presented in this paper presents a nicer solution to problems that others have solved in ad-hoc ways. Limitations. Sometimes original attributes need to be replaced with more complicated data types. We do not cater for that: shuffle tool in UHC, type classes, parametrization. We only consider traversals. There is some solution to this however. Parameterize traversals over primitive operations. Like with any component system: don't overengineer. Contributions. To be written later. %endif %if False \section{Introduction} \begin{figure}[htb] \begin{tikzpicture} [data/.style={circle,draw=blue!50,fill=blue!20,thick,inner sep=0pt,minimum size=12mm}] \node at (0,0) [data] (ast) {$\mathsf{ast}$}; \node at (0,-2cm) [data] (ana-asti) {$\mathsf{ast}_i$}; \node at (2cm,-3cm) [data] (absi1) {$\mathsf{abs}_{i,1}$}; \node at (4.8cm,-4cm) [data] (abs) {$\mathsf{abs}_{\overline{i},j}$}; \node at (6.9cm,-5cm) [data] (res) {$\mathsf{res}_{\overline{i},j}$}; \node at (4.8cm,-6cm) [data] (ana-abs) {$\mathsf{abs}_{\overline{i},j+1}$}; \node at (2cm,-7cm) [data] (ana-absim) {$\mathsf{abs}_{i,m}$}; \node at (0,-8cm) [data] (ana-asti1) {$\mathsf{ast}_{i+1}$}; \node at (0,-10cm) [data] (ana-astn) {$\mathsf{ast}_n$}; \draw[->, dashed] (ast) to node[auto]{analyze*} (ana-asti); \draw[->] (ana-asti) to node[auto, yshift=1mm, xshift=-2mm]{abstract} (absi1); \draw[->, dashed] (absi1) to node[auto, yshift=1mm, xshift=-2mm]{(analyze,abstract)*} (abs); \draw[->] (abs) to node[auto]{algorithm} (res); \draw[->] (res) to node[auto]{interpret} (ana-abs); \draw[->, dashed] (ana-abs) to node[auto, yshift=-1mm, xshift=-2mm]{(merge,analyze)*} (ana-absim); \draw[->] (ana-absim) to node[auto, yshift=-1mm, xshift=-2mm]{merge} (ana-asti1); \draw[->, dashed] (ana-asti1) to node[auto]{analyze*} (ana-astn); \draw[->, dotted] (ana-asti) to (ana-asti1); \draw[->, dotted] (absi1) to (ana-absim); \draw[->, dotted] (abs) to (ana-abs); \end{tikzpicture} \caption{Data-flow diagram of an orchestration of analyses.} \label{fig.analysis.dataflow} \end{figure} Compilation can be modeled as a pipeline of tree transformations~\cite{DBLP:conf/haskell/DijkstraFS09}. Each transformation in the pipeline is preceded by an analysis that traverses the AST and decorates it with analysis results. Typically, the analysis is not performed on the actual AST itself, but on a generated \emph{abstraction} of it, in the form of a desugared tree, a control-flow graph, or a collection of constraints. As an example, consider a constraint-based analysis. It consists of a traversal that generates constraints, and an algorithm that takes the constraints, and solves them. A solver for a particular constraint language may be reused to solve constraints that can be mapped to constraints in this language. This approach has two problems: tweaking of algorithm and exchange of data. Our goal is to take this approach one step further. We compose an analysis out of a library of reusable tree traversals. Our goal is to compose an analysis out of a library of reusable tree traversals. To achieve this goal, two challenges have to be overcome: Each traversal either applies a known algorithm on the tree to decorate it with analysis results, or produces an \emph{abstraction} of the tree and analyzes that (recursively) instead. Figure~\ref{fig.analysis.dataflow} illustrates this situation. For example, name analysis can be reduced to an analysis over a much simpler tree denoting only scoping, binding sites and use sites. Damas-Milner type inference can be reduced to solving a set of equality, instantiation, and generalization constraints~\cite{citeulike:352561}. The analysis on the abstracted tree is easier or can be reused for other analyses that reduce to the same abstraction. Constraint-based analysis typically exhibit this approach. such as a desugared tree, a control-flow graph, or a collection of constraints. %endif %if False \section{Repmin with Imperative Visitors} %format . = "." %format abstract = "\mathbf{abstract}" %format class = "\mathbf{class}" %format void = "\mathbf{void}" %format extends = "\mathbf{extends}" %format implements = "\mathbf{implements}" %format interface = "\mathbf{interface}" %format for = "\mathbf{for}" %format return = "\mathbf{return}" %format (Generic t1 t2) = t1 "\langle" t2 "\rangle" %format := = ":=" %format var = "\mathbf{var}" \begin{figure}[htb] \begin{code} abstract class Tree abstract void accept(TreeVisitor v) class Node extends Tree int value (Generic List Tree) subtrees void accept(TreeVisitor v) v.visit(this) interface TreeVisitor void visit(Node n) abstract class (Generic MyList T) abstract void accept(ListVisitor v) class Cons extends (Generic MyList T) T ref (Generic MyList T) next void accept((Generic ListVisitor T) v) v.visit(this) class Nil extends (Generic MyList T) void accept((Generic ListVisitor T) v) v.visit(this) interface (Generic ListVisitor T) void visit((Generic Cons T) c) void visit((Generic Nil T) n) \end{code} \caption{Visitors for a tree and list} \label{fig.example.visitors.imperative} \end{figure} \begin{figure}[htb] \begin{code} class GenListVisitor implements TreeVisitor (Generic MyList Node) list GenListVisitor() list = new Nil() void visit(Node n) var c := new Cons() c.ref := this c.next := list list := c for (Node m : n.subtrees) m.accept(this) class SumVisitor implements (Generic ListVisitor Node) int gathSum SumVisitor() gathSum := 0 visit(Nil n) return visit(Cons c) gathSum := ref.value + gathSum c.next.accept(this) class UpdateVisitor implements (Generic ListVisitor Node) int sum UpdateVisitor(int finalSum) sum = finalSum visit(Nil n) return visit(Cons c) ref.value := sum c.next.accept(this) void main(String[] args) var ast := parse (args[0]) var gen := new GenListVisitor() ast.accept(gen) var addr := new SumVisitor() gen.list.accept(aggregator) var upd := new UpdateVisitor(addr.gathSum) gen.list.accept(upd) \end{code} \caption{Imperative Repmin.} \label{fig.example.repmin.imperative} \end{figure} %endif \acks Acknowledgments, if needed. \bibliographystyle{abbrvnat} \bibliography{references} %if False \appendix \section{Imperative Visitor Code} \begin{code} {-# LANGUAGE TypeFamilies #-} module Example where import Data.IORef deref = readIORef assign = writeIORef type family Syn a type family Inh a data InOut a = InOut { ast :: a, inh :: Inh a, syn :: Syn a} data List = List { ioList :: InOut List, altList :: ListAlt } data ListAlt = Cons ConsLoc List | Nil NilLoc data ListInh = ListInh { finSumRef :: IORef Int } data ListSyn = ListSyn { gathSumRef :: IORef Int } data ConsLoc = ConsLoc { leafRef :: IORef LeafLoc } data NilLoc = NilLoc {} type instance Inh List = ListInh type instance Syn List = ListSyn type ListVisitor = ( InOut List -> NilLoc -> IO () , InOut List -> ConsLoc -> InOut List -> IO () ) acceptList :: List -> ListVisitor -> IO () acceptList (List io (Nil loc)) (nil, _) = nil io loc acceptList (List io (Cons loc (List io' _))) (_, con) = con io loc io' single :: LeafLoc -> IO List single loc = do [r1,r2,r3,r4,r5] <- sequence $ replicate 5 (newIORef undefined) assign r3 loc return $ mk r1 r2 $ Cons (ConsLoc r3) $ mk r4 r5 $ Nil NilLoc where mk r1 r2 tl = let l = List (io l r1 r2) tl in l io l r1 r2 = InOut l (ListInh r1) (ListSyn r2) conc :: List -> List -> List conc l1 l2 = foldr (\(io,loc) r -> List io (Cons loc r)) (nil l1) (toList l1 ++ toList l2) where toList (List _ (Nil _)) = [] toList (List io (Cons loc r)) = (io,loc) : toList r nil (List _ (Cons _ r)) = nil r nil l = l shwList :: List -> IO String shwList (List _ (Nil _)) = return "nil" shwList (List _ (Cons loc r)) = do leaf <- deref (leafRef loc) val <- deref (valRef leaf) s <- shwList r return ((show val) ++ " : " ++ s) data Tree = Tree { ioTree :: InOut Tree, altTree :: TreeAlt } data TreeAlt = Leaf LeafLoc | Bin BinLoc Tree Tree data TreeInh = TreeInh {} data TreeSyn = TreeSyn { gathListRef :: IORef List } data BinLoc = BinLoc {} data LeafLoc = LeafLoc { valRef :: IORef Int } type instance Inh Tree = TreeInh type instance Syn Tree = TreeSyn type TreeVisitor = ( InOut Tree -> LeafLoc -> IO () , InOut Tree -> BinLoc -> InOut Tree -> InOut Tree -> IO () ) acceptTree :: Tree -> TreeVisitor -> IO () acceptTree (Tree io (Leaf loc)) (leaf, _) = leaf io loc acceptTree (Tree io (Bin loc (Tree iol _) (Tree ior _))) (_, bin) = bin io loc iol ior leafM :: Int -> IO Tree leafM n = do r1 <- newIORef n r2 <- newIORef undefined let t = Tree (InOut t TreeInh (TreeSyn r2)) (Leaf $ LeafLoc r1) return t nodeM :: IO Tree -> IO Tree -> IO Tree nodeM ioL ioR = do left <- ioL right <- ioR ref <- newIORef undefined let t = Tree (InOut t TreeInh (TreeSyn ref)) (Bin BinLoc left right) return t shwTree :: Tree -> IO String shwTree (Tree _ (Leaf loc)) = do v <- deref (valRef loc) return $ "(leaf " ++ show v ++ ")" shwTree (Tree _ (Bin _ left right)) = do l <- shwTree left r <- shwTree right return $ "(bin " ++ l ++ " " ++ r ++ ")" synListVisitor :: TreeVisitor synListVisitor = ( \lhs loc -> single loc >>= assign (gathListRef $ syn lhs) , \lhs loc left right -> do acceptTree (ast left) synListVisitor l <- deref (gathListRef $ syn left) acceptTree (ast right) synListVisitor r <- deref (gathListRef $ syn right) assign (gathListRef $ syn $ lhs) (conc l r) ) sumVisitor :: ListVisitor sumVisitor = ( \lhs loc -> assign (gathSumRef $ syn lhs) 0 , \lhs loc tl -> do acceptList (ast tl) sumVisitor tlSum <- deref (gathSumRef $ syn tl) leaf <- deref (leafRef loc) val <- deref (valRef leaf) assign (gathSumRef $ syn lhs) (tlSum + val) ) backVisitor :: ListVisitor backVisitor = ( \lhs loc -> return () , \lhs loc tl -> do fin <- deref (finSumRef $ inh lhs) leaf <- deref (leafRef loc) assign (valRef leaf) fin assign (finSumRef $ inh tl) fin acceptList (ast tl) backVisitor ) main :: IO () main = do t <- nodeM (nodeM (leafM 1) (leafM 2)) (leafM 3) acceptTree t synListVisitor l <- deref (gathListRef $ syn $ ioTree t) shwList l >>= putStrLn acceptList l sumVisitor sum <- deref (gathSumRef $ syn $ ioList l) assign (finSumRef $ inh $ ioList l) sum acceptList l backVisitor shwTree t >>= putStrLn \end{code} %endif \end{document}