\documentclass{article} %include lhs2TeX.fmt %include polycode.fmt \usepackage{float} \usepackage{graphicx} \usepackage{pgf} \usepackage{tikz} \usetikzlibrary{arrows,positioning,trees} \floatstyle{boxed} \restylefloat{figure} \newcommand\lang[1]{\textsc{#1}} \newcommand\keyw[1]{\mathbf{#1}} %format . = "." %format itf = "\keyw{itf}" %format inh = "\keyw{inh}" %format syn = "\keyw{syn}" %format sem = "\keyw{sem}" %format child = "\keyw{child}" %format visit = "\keyw{visit}" %format tail = "\keyw{tail}" %format match = "\keyw{match}" \begin{document} \title{Relocatable Visits} \author{Arie Middelkoop} \maketitle \section{Introduction} Visitors from the Visitor Design Pattern traverse an Abstract Syntax Tree and modify the state of the nodes they visit, thereby calculating properties of the AST. Complicated traversals are possible, where visitors visit some nodes in a complex order. Unfortunately, the effects on the states of the nodes are then difficult to predict. In previous work, we introduced visit functions, which are a purely functional replacement for visitors. These pure functions disallow destructible updates on the state of the nodes. The effects are instead captured as state transitions, where each state transition corresponds to a call to a continuation of the visit function. In this model, effects on state are severely restricted, thus simplifying reasoning and predictability. Consequently, we are limited to depth-first traversals only. It is not immediately obvious how to represent a visitor that jumps non-locally around the AST. In this paper, we deal with non-local traversals. We first introduce first-class traversals, which are views on the AST together with some meta information. Instead of traversing the tree, we compute such a traversal from the AST, and traverse the traversal instead. Finally, we incorporate the results of the traversal back in the original AST. \section{Prerequisites} \subsection{Visitors} The visitor design pattern is often employed when writing algorithms that operate on Abstract Syntax Trees. The algorithm splits up into a number of \emph{visitors} that are applied after each other to the entire AST or a portion of it. A visitor iterates through the nodes of the AST in a specific order. Playing with this order is the central theme of this paper, and we come back to these orders later. A state is maintained with each node of the AST. A visitor has access to a portion of the state of the direct parent and the children of the node that it visits, and consists of small computations that manipulates the state of the node based on theirs. This allows information to flow from the root of the tree to the bottom, and the other way around. Multiple iterations, hence multiple visitors, may be a necessity. For example, to gradually build a symbol table in the first iteration, and using the completed symbol table in the second iteration. The order in which the nodes are visited is important. Some computations are more easily done when the nodes are visited in the right order. For example, consider an AST with a variable number of children per node. Suppose we write an algorithm that computes for each node in the AST, the number of sibling nodes to the right of it. A visitor that visits children from right to left is preferable compared to the other way around. For a more complicated where nodes are not visited in the order from parent to children, consider a visitor that computes from the AST representation some Java classes, a symbol table that has for each class the list of its fields. To compute these fields, we need the fields defined in the class itself, as well as the fields of the super class. It is therefore beneficial that the nodes of the super classes are already visited. Finally, when applying visitors repeatedly to perform a fixpoint iteration, convergence is reached with less iterations if nodes are visited in an order, such that during computation in the current iteration, results computed in the current iteration can already be taken instead of values from the previous iteration. The effect of visitors on the state of the nodes is essential and at the same time intricate. Before a visitor on a node invokes the visitor of one of its children, the state of the node must correspond to what the visitor on the child expects, and the other way around upon completion on the visit of the child. Such invariants on states are left implicit, and easily broken, leading to unpredictable behavior. These invariants are actually calling conventions, and we would like to make these explicit. We achieve this goal when we model visitors with \emph{pure} and \emph{strongly typed} visit \emph{functions}. \subsection{Visit Functions} With visit functions, instead of applying a visitor to each AST node, we represent an AST node intentionally. Each node of the AST is represented with an instance of a visit function that encapsulates the state of the AST node. This state is hidden inside the visit function, and cannot be accessed directly. A visit function is queried by calling it with a record of values for some parameters. It then returns a record containing several output values, and a continuation visit-function that encapsulates the transitioned state for subsequent queries. The type signature of the visit function precisely describes in what state the visit function is, and what inputs and outputs are expected. Decoration of the tree proceeds as follows. The root of the AST is passed as argument for the very first query of the visit function. The visit function scrutinizes this node, potentially calls itself recursively on the children of the node to get the continuations of the children, and then builds the continuation to use for subsequent queries. These continuations are thus the intentional representation of the AST. We write these visit functions in the domain specific language \lang{visits-ag} for \lang{Haskell}. In this language, you write an interface for visit functions, and the body of visit functions in an aspect-oriented way. The following are examples of \emph{interfaces}: \begin{code} itf Decorate -- declares a type for a visit named |Decorate| inh ast :: Tree -- with one input |ast| with as type |Tree| tail :: Aggregate -- continuation-visit has type |Aggregate| itf Aggregate -- declares a type for a visit named |Aggregate| syn sum :: Int -- with one output named |sum| of type |Int| \end{code} A visit function with the type |Decorate| can be called by passing it a value for the |ast| (of type |Tree|). It then returns a continuation, which, when called, computes the sum of the tree. The inputs and outputs |ast| and |sum| are called \emph{attributes}. Inside a visit function, we can refer to the attributes using the notation |child.field|, where |child| is the name of a child with some interface |T|, and |field| is an attribute of |T|. The child name |loc| refers to the hidden state of the visit function, and the child name |lhs| to the attributes of the visit function itself. We write this visit function with a combination of \lang{Haskell} and \lang{visit-ag} syntax. The body of a visit function consists of a number of statements to introduce children, call their visits, and to scrutinize and define attributes. The following example illustrates. \begin{code} data Tree = Bin Tree Tree | Leaf Int compSum = -- define a visit function named |compSum| sem :: Decorate -- a function alternative for |Leaf|s match (Leaf x) = lhs.ast -- scrutinize |lhs.ast| tail sem :: Visit2 -- defines continuation lhs.sum = x -- defines |lhs.sum| `mappend` -- glue two cases of the visit function sem :: Decorate -- a function alternative for |Bin|s match (Bin loc.l loc.r) = lhs.ast -- scrutinize |lhs.ast| child left :: Decorate = compSum -- child for left subtree child right :: Decorate = compSum -- child for right subtree left.ast = loc.l -- input for the visit on child |left| right.ast = loc.r -- input for the visit on child |right| tail sem :: Aggregate -- defines continuation visit left :: Aggregate -- call continuation on |left| visit right :: Aggregate -- call continuation on |right| lhs.sum = left.sum + right.sum -- use outputs \end{code} From the interfaces, we generates wrapper functions, which allow us to invoke the visitors from the Haskell world: \begin{code} analyze ast = let cont = wrapDecorate compSum ast sum = wrapAggregate cont in sum \end{code} In this example, the compSum visit function is called with the AST as parameter. It scrutinizes the AST recursively, thereby building the visit-function representation of the tree in the form of continuations. Each call to a continuation we call a visit. The actual sum is computed in the second visit. The visit functions are pure, thus the state is immutable. Because of this restriction, the only way to change state is to invoke the continuation. These continuations have a type (the interface), and these types are a static property of the program. We thus statically know (and are required to know) for each visit, in what visit the children are and what values have been computed. The lifetime of attributes is therefore statically known. For example, it is easy to verify that for an attribute definition of the form |c.a = e|, that for each attribute |c'.a'| indirectly referenced by |e|, the call to the proper continuation on |c'| has taken place before |c.a| is defined. Despite these restrictions, there is still some degree in freedom to influence the visit order. For each visit function, the order in which the children are visited can be determined based on attributes computed earlier. However, the order of visits are restricted to parents-to-children only. There are traversals that do not satisfy this restriction. For example, given a cons-list of numbers, with all even numbers before the odd numbers, consider a traversal that visits the elements of the list in increasing order. It is not possible to implement this with pure visit functions using a constant number of iterations, \emph{unless we change the representation of the AST}. \section{Channels} \begin{code} channel rem stub :: Itf -- creates a local attribute rem and stub child r :: Itf = loc.rem child s :: Co Itf = loc.stub \end{code} \tikzstyle{node} = [circle, draw, minimum width=3.5em] \tikzstyle{attr} = [rectangle, draw] \begin{tikzpicture}[level/.style={sibling distance=3.5cm, level distance=1.7cm}] \node[node](root){|+|} child { node[node](defx){|def x|} edge from parent child{ node[node](plusleft){|+|} edge from parent child { node[node](oneleft){|1|} edge from parent } child { node[node](twoleft){|2|} edge from parent } } } child { node[node](plusright){|+|} edge from parent child { node[node](usex){|use x|} edge from parent } child { node[node](threeright){|3|} edge from parent } }; \draw[<->, dashed, thick] (usex.north) to[out=90,in=0] (defx.east); \begin{scope}[node distance=0.1cm] \node[attr, right=of threeright]{|3|}; \node[attr, right=of usex]{|3|}; \node[attr, right=of plusright]{|6|}; \node[attr, left=of oneleft]{|1|}; \node[attr, left=of twoleft]{|2|}; \node[attr, left=of plusleft]{|3|}; \node[attr, left=of defx]{|3|}; \node[attr, right=of root]{|9|}; \end{scope} \end{tikzpicture} \section{Scratch section} Initial situation. A three-visit function, with interface |V1| for the first visit, |V2| for the second visit, and |V3| for the third. This visit function has one child named |k|. Conventionally, the parent contains visit-statements for the child, provides inputs to the visits of the child, and may use the outputs. We leave out attribute definitions. \begin{code} sem :: V1 -- called by parent node child k :: V1 visit k :: V1 tail sem :: V2 -- called by parent node visit k :: V2 tail sem :: V3 -- called by parent node visit k :: V3 \end{code} Consider a different situation. Instead of invoking visit |V2| of |k| by the parent, we delegate the invocation to another place determined by the parent's parent. \begin{code} sem :: V1 -- called by parent node child k :: V1 visit k :: V1 tail delegate nodeId -- expose some info value for this visit sem :: V2 -- called by parent node require k :: V2 -- demand that we need the applied result of k tail sem :: V3 -- called by parent node visit k :: V3 \end{code} In a delegated visit, the visit of the child may be deferred as well. Instead of using visit, the statement require demands that the caller of the delegated visit ensures that the child has already been visited. \begin{description} \item[require] The invoker of the current visit ensures that the visit on k has taken place. \end{description} A deferred visit may not be dependent on the result of the deferred child, and only require it the next visit or later. \begin{description} \item[deferred] The invoker of the current visit ensures that the visit on k has taken place for the next visit. \end{description} Finally, a deferred visit may be invoked, with a special statement: \begin{code} invoke k c : V = s \end{code} In this case, we produce a child |c| with semantics |s|, which gets as first parameter the AST of a dependency-graph generated from the |delegate| and |require| statements of |k|. It furthermore defines the name |V| to use as interface for |s|. Additional atttributes can be defined on |V| if desirable for the traversal. \begin{code} data Trav | Split travs : Travs | Cycle travs : Travs | Visit_Ck info : ... cont : ... | Visited_Ck ... \end{code} \section{Use Case: Type-checking Java classes} \section{Use Case: Data-flow analysis with Fixpoint} \section{Discussion} Limitation: need to be able to compute the traversal first. Limitation: we restrict a first-class traversal to a dependency graph only. \end{document}