\documentclass{article}

%include lhs2TeX.fmt
%include polycode.fmt

\usepackage{float}
\usepackage{graphicx}
\usepackage{pgf}
\usepackage{tikz}
\usetikzlibrary{arrows,positioning,trees}
\floatstyle{boxed}
\restylefloat{figure}

\newcommand\lang[1]{\textsc{#1}}
\newcommand\keyw[1]{\mathbf{#1}}

%format . = "."
%format itf   = "\keyw{itf}"
%format inh   = "\keyw{inh}"
%format syn   = "\keyw{syn}"
%format sem   = "\keyw{sem}"
%format child = "\keyw{child}"
%format visit = "\keyw{visit}"
%format tail  = "\keyw{tail}"
%format match = "\keyw{match}"

\begin{document}

\title{Relocatable Visits}
\author{Arie Middelkoop}

\maketitle

\section{Introduction}

  Visitors from the Visitor Design Pattern traverse an Abstract Syntax Tree
  and modify the state of the nodes they visit, thereby calculating properties
  of the AST. Complicated traversals are possible, where visitors visit some
  nodes in a complex order. Unfortunately, the effects on the states of the
  nodes are then difficult to predict.

  In previous work, we introduced visit functions, which are a purely functional
  replacement for visitors. These pure functions disallow destructible updates
  on the state of the nodes. The effects are instead captured as state transitions,
  where each state transition corresponds to a call to a continuation of the
  visit function. In this model, effects on state are severely restricted, thus
  simplifying reasoning and predictability. Consequently, we are limited to
  depth-first traversals only. It is not immediately obvious how to represent a
  visitor that jumps non-locally around the AST.
  
  In this paper, we deal with non-local traversals. We first introduce first-class
  traversals, which are views on the AST together with some meta information.
  Instead of traversing the tree, we compute such a traversal from the AST, and
  traverse the traversal instead. Finally, we incorporate the results of the
  traversal back in the original AST.

\section{Prerequisites}

  \subsection{Visitors}
  The visitor design pattern is often employed when writing algorithms that operate on
  Abstract Syntax Trees.
  The algorithm splits up into a number of \emph{visitors}
  that are applied after each other to the entire AST or a portion of it.
  A visitor iterates through the nodes of the AST in a specific order. Playing with
  this order is the central theme of this paper, and we come back to these orders
  later.

  A state is maintained with each node of the AST. A visitor has access to a
  portion of the state of the direct parent and the children of the node that
  it visits, and consists of small computations that manipulates the state of
  the node based on theirs. This allows information to flow from the root of
  the tree to the bottom, and the other way around.
  Multiple iterations, hence multiple visitors, may be a necessity. For example,
  to gradually build a symbol table in the first iteration, and using the completed
  symbol table in the second iteration.

  The order in which the nodes are visited is important. Some computations are more
  easily done when the nodes are visited in the right order. For example, consider an
  AST with a variable number of children per node. Suppose we write an algorithm that
  computes for each node in the AST, the number of sibling nodes to the right of it.
  A visitor that visits children from right to left is preferable compared to
  the other way around.

  For a more complicated where nodes are not visited in the order from parent to
  children, consider a visitor that computes from
  the AST representation some Java classes, a symbol table that has for each class
  the list of its fields. To compute these fields, we need the fields defined in
  the class itself, as well as the fields of the super class. It is therefore
  beneficial that the nodes of the super classes are already visited.
  Finally, when applying visitors repeatedly to perform a fixpoint iteration,
  convergence is reached with less iterations if nodes are visited in an order, such
  that during computation in the current iteration, results computed in the current 
  iteration can already be taken instead of values from the previous iteration.
  
  The effect of visitors on the state of the nodes is essential and at the same time
  intricate. Before a visitor on a node invokes the visitor of one of its children, the
  state of the node must correspond to what the visitor on the child expects, and the
  other way around upon completion on the visit of the child. Such invariants on states
  are left implicit, and easily broken, leading to unpredictable behavior. These
  invariants are actually calling conventions, and we would like to make these explicit.
  We achieve this goal when we model visitors with \emph{pure} and \emph{strongly typed}
  visit \emph{functions}.
  
  \subsection{Visit Functions}
  With visit functions, instead of applying a visitor to each AST node, we represent
  an AST node intentionally. Each node of the AST is represented with an instance of
  a visit function that encapsulates the state of the AST node. This state is hidden
  inside the visit function, and cannot be accessed directly.
  
  A visit function is queried by calling it with a record of values for some
  parameters. It then returns a record containing several output values, and a
  continuation visit-function that encapsulates the transitioned state for subsequent
  queries. The type signature of the visit function precisely describes in what state
  the visit function is, and what inputs and outputs are expected.
  
  Decoration of the tree proceeds as follows. The root of the AST is passed as argument
  for the very first query of the visit function. The visit function scrutinizes this
  node, potentially calls itself recursively on the children of the node to get the
  continuations of the children, and then builds the continuation to use for subsequent
  queries. These continuations are thus the intentional representation of the AST.
  
  We write these visit functions in the domain specific language \lang{visits-ag}
  for \lang{Haskell}.
  In this language, you write an interface for visit functions, and the body of
  visit functions in an aspect-oriented way.

  The following are examples of \emph{interfaces}:
  \begin{code}
  itf Decorate          -- declares a type for a visit named |Decorate|
    inh ast :: Tree     -- with one input |ast| with as type |Tree| 
    tail :: Aggregate   -- continuation-visit has type |Aggregate|
  itf Aggregate         -- declares a type for a visit named |Aggregate|
    syn sum :: Int      -- with one output named |sum| of type |Int|
  \end{code}
  A visit function with the type |Decorate| can be called by passing it
  a value for the |ast| (of type |Tree|). It then returns a continuation,
  which, when called, computes the sum of the tree.
  The inputs and outputs |ast| and |sum| are called \emph{attributes}.
  Inside a visit function, we can refer to the attributes using the
  notation |child.field|, where |child| is the name of a child with
  some interface |T|, and |field| is an attribute of |T|. The child
  name |loc| refers to the hidden state of the visit function, and the
  child name |lhs| to the attributes of the visit function itself.
  
  We write this visit function with a combination of \lang{Haskell} and
  \lang{visit-ag} syntax. The body of a visit function consists of a
  number of statements to introduce children, call their visits,
  and to scrutinize and define attributes. The following
  example illustrates.
  \begin{code}
  data Tree = Bin Tree Tree | Leaf Int

  compSum =  -- define a visit function named |compSum|
    sem :: Decorate  -- a function alternative for |Leaf|s 
      match (Leaf x) = lhs.ast  -- scrutinize |lhs.ast|
      tail  sem :: Visit2       -- defines continuation
              lhs.sum = x       -- defines |lhs.sum|
    `mappend`  -- glue two cases of the visit function
    sem :: Decorate  -- a function alternative for |Bin|s
      match (Bin loc.l loc.r) = lhs.ast  -- scrutinize |lhs.ast|
      child left   :: Decorate = compSum   -- child for left subtree
      child right  :: Decorate = compSum   -- child for right subtree
      left.ast   = loc.l   -- input for the visit on child |left|
      right.ast  = loc.r   -- input for the visit on child |right|
      tail  sem :: Aggregate  -- defines continuation
              visit left  :: Aggregate  -- call continuation on |left|
              visit right :: Aggregate  -- call continuation on |right|
              lhs.sum = left.sum + right.sum  -- use outputs
  \end{code}
  From the interfaces, we generates wrapper functions, which allow
  us to invoke the visitors from the Haskell world:
  \begin{code}
  analyze ast =  let  cont  = wrapDecorate compSum ast
                      sum   = wrapAggregate cont
                 in sum
  \end{code}
  In this example, the compSum visit function is called with the
  AST as parameter. It scrutinizes the AST recursively, thereby
  building the visit-function representation of the tree in the
  form of continuations. Each call to a continuation we call a
  visit. The actual sum is computed in the second visit.
  
  The visit functions are pure, thus the state is immutable.
  Because of this restriction, the only way to
  change state is to invoke the continuation. These continuations
  have a type (the interface), and these types are a static
  property of the program. We thus statically know (and are required
  to know) for each visit, in what visit the children are and
  what values have been computed. The lifetime of attributes is
  therefore statically known. For example, it is easy to verify
  that for an attribute definition of the form |c.a = e|, that for
  each attribute |c'.a'| indirectly referenced by |e|, the call to
  the proper continuation on |c'| has taken place before |c.a| is
  defined.
  
  Despite these restrictions, there is still some degree in
  freedom to influence the visit order. For each visit function,
  the order in which the children are visited can be determined
  based on attributes computed earlier. However, the order of
  visits are restricted to parents-to-children only.
  There are traversals that do not satisfy this restriction.
  For example, given a cons-list of numbers, with all even
  numbers before the odd numbers, consider a traversal that
  visits the elements of the list in increasing order.
  It is not possible to implement this with pure visit functions
  using a constant number of iterations, \emph{unless we change
  the representation of the AST}.

\section{Channels}

  \begin{code}
  channel rem stub :: Itf   -- creates a local attribute rem and stub
  child r :: Itf    = loc.rem
  child s :: Co Itf = loc.stub
  \end{code}
  
  \tikzstyle{node} = [circle, draw, minimum width=3.5em]
  \tikzstyle{attr} = [rectangle, draw]
  
  \begin{tikzpicture}[level/.style={sibling distance=3.5cm, level distance=1.7cm}]
    \node[node](root){|+|}
      child {
        node[node](defx){|def x|}
        edge from parent
        child{
          node[node](plusleft){|+|}
          edge from parent
          child {
            node[node](oneleft){|1|}
            edge from parent
          }
          child {
            node[node](twoleft){|2|}
            edge from parent
          }
        }
      }
      child {
        node[node](plusright){|+|}
        edge from parent
        child {
          node[node](usex){|use x|}
          edge from parent
        }
        child {
          node[node](threeright){|3|}
          edge from parent
        }
      };
      \draw[<->, dashed, thick] (usex.north) to[out=90,in=0] (defx.east);
      
      \begin{scope}[node distance=0.1cm]
      \node[attr, right=of threeright]{|3|};
      \node[attr, right=of usex]{|3|};
      \node[attr, right=of plusright]{|6|};
      \node[attr, left=of oneleft]{|1|};
      \node[attr, left=of twoleft]{|2|};
      \node[attr, left=of plusleft]{|3|};
      \node[attr, left=of defx]{|3|};
      \node[attr, right=of root]{|9|};
      \end{scope}
  \end{tikzpicture}


\section{Scratch section}

  Initial situation. A three-visit function, with interface |V1| for the first visit,
  |V2| for the second visit, and |V3| for the third. This visit function has one child
  named |k|. Conventionally, the parent contains visit-statements for the child,
  provides inputs to the visits of the child, and may use the outputs. We leave out
  attribute definitions.

  \begin{code}
  sem :: V1  -- called by parent node
    child k :: V1
    visit k :: V1
    tail
      sem :: V2  -- called by parent node
        visit k :: V2
        tail  sem :: V3 -- called by parent node
                visit k :: V3
  \end{code}
  
  Consider a different situation. Instead of invoking visit |V2| of |k| by the parent, we
  delegate the invocation to another place determined by the parent's parent.

  \begin{code}
  sem :: V1  -- called by parent node
    child k :: V1
    visit k :: V1
    tail
      delegate nodeId -- expose some info value for this visit
      sem :: V2  -- called by parent node
        require k :: V2  -- demand that we need the applied result of k
        tail  sem :: V3 -- called by parent node
                visit k :: V3
  \end{code}

  In a delegated visit, the visit of the child may be deferred as well.
  Instead of using visit, the statement require demands that the caller
  of the delegated visit ensures that the child has already been visited.

  \begin{description}
  \item[require]
  The invoker of the current visit ensures that the
  visit on k has taken place.
  \end{description}

  A deferred visit may not be dependent on the result of the deferred child, and
  only require it the next visit or later.
  
  \begin{description}
  \item[deferred]
  The invoker of the current visit ensures that the
  visit on k has taken place for the next visit.
  \end{description}

  Finally, a deferred visit may be invoked, with a special statement:
  \begin{code}
    invoke k c : V = s
  \end{code}
  In this case, we produce a child |c| with semantics |s|, which gets as first parameter the
  AST of a dependency-graph generated from the |delegate| and |require| statements of |k|.
  It furthermore defines the name |V| to use as interface for |s|. Additional atttributes
  can be defined on |V| if desirable for the traversal.
  
  \begin{code}
  data Trav
    | Split       travs : Travs
    | Cycle       travs : Travs
    | Visit_Ck    info : ... cont : ...
    | Visited_Ck  ...
  \end{code}

\section{Use Case: Type-checking Java classes}

\section{Use Case: Data-flow analysis with Fixpoint}

\section{Discussion}

  Limitation: need to be able to compute the traversal first.
  Limitation: we restrict a first-class traversal to a dependency graph only.

\end{document}