%include format.lhs

In Chapter~\tref{chapt.side-effect} we described a programming model based on visits
as an extension of attribute grammars. Using this model, attribute declarations are
not as easily composed as with conventional attribute grammars.
In this chapter, we introduce phases as a generalization of visits, for which attribute
declarations are composable.

Further, we introduce commuting rules, which are rules that thread a
chained attribute (Section~\tref{intro.sect.ag.dhm} and Section~\tref{intro.sect.copyrules}),
with the difference that the rules in the chain can be reordered.
To preserve referential transparency, the commuting rules
must satisfy a liberal commutativity law.

This chapter is orthogonal with respect to the subsequent chapters. It presents the technical
material which can be used to generalize the contents of the subsequent chapters.

\section{Introduction}

In this chapter, we show three related extensions of
the explicit visit-approach in Chapter~\tref{chapt.side-effect} and
ordered attribute grammars in general.

\begin{defi}[Phase]
A phase is a generalization of a visit. It gives a name to a unit of evaluation for attributes
associated with a nonterminal symbol.
\end{defi}

Firstly, instead of associated visit declarations with nonterminals as in the previous chapter, we
associate sets of phase declarations with nonterminals in this chapter.
We allow attribute declarations and rules to depend on the beginning of a phase or be constrained
by the end of a phase, and impose a partial order on phases.
When the value dependencies between attributes and rules and the dependencies induced
by phases are acyclic, the attributes that are associated to a previous phase
are defined before any attributes that are associated with a later phase, which provides
a way to specify the order of evaluation of rules.

Secondly, we use a statically ordered evaluation to compute values of attributes, and we present a
scheduling algorithm in the style of \citet{DBLP:conf/popl/KennedyW76}
which infers multiple visit interfaces from the phase interface of a nonterminal.
For each visit interface, a production has
a potentially different execution plan. An execution plan specifies an
explicit ordering of rules per visit (Section~\tref{sect.treewalkautomaton}). Such a plan specifies also for
each child which visit interface is used for its evaluation.

In the generated code, semantic functions are indexed by the choice of the visit interface for which
it provides a semantics, and we show how to encode this in a strongly typed, purely functional language.
There may be many possible visit interfaces given the phase interface of a
nonterminal. The constraints on rule and attribute ordering as mentioned above can be used to
significantly limit the number of possible visit interfaces.

Finally, we present commutable rules, which allows us to functionally
encode side effects.
A functional encoding of side effects as mentioned in Chapter~\tref{chapt.side-effect}
can be accomplished with a chained attribute that is threaded through each operation, where
an operation is a rule or higher-order child. For
example, a substitution attribute captures the side effects that result from substitutions during
type inference, and an attribute with the type |RealWorld| captures arbitrary I/O.
The attribute's threading determines the order in which the side effects
are observable in the values of the attribute.

For some threaded attributes, the order imposed by the threading may be irrelevant, for
example when the rules represent \emph{commuting} operations, such as when handing out unique
numbers. However, the order imposed by threading the attribute may induce cycles
in the dependencies of attributes. We present a mechanism to allow the order to be left
unspecified and given the freedom to be determined by the implementation.

\begin{defi}[Commutable rule]
A \emph{commutable rule} is a rule that threads a chained attribute, but does not depend
(indirectly) on previous rules in the chain.
\end{defi}

Commutable rules thus provide more freedom
in their ordering. To preserve referential transparency,
the composition of commutable rules must satisfy a liberal commutativity law, as we show later.
%% commutative rules allow us to functionally express side effects as if we threaded
%% the attribute through each visit.
Commutable rules also allow the functional encoding of a
side-effectful rule that is scheduled to different implicit visits.

We thus offer a mechanism to explicitly enforce constraints on the evaluation order of
attributes, and another mechanism to loosen the constraints imposed by rules.
The combination offers convenient ways to declaratively specify
side-effectful operations and their algorithmic evaluation order.

Phases and commutable rules have in common that the associated scheduling algorithm takes information
into account that is not visible in the attribute dependencies induced by the rules.
Phases induce ordering constraints based on dependencies between attributes and the lexical
scope of rules. Commutable rules require a barrier between rules that commute over an
attribute and rules that do not.
Therefore, we introduce barrier attributes and dependency rules, which allow the encoding
of such additional dependencies.

\begin{defi}[Designator]
A \emph{designator} |d| (Figure~\tref{warren.fig.agcore})
is either an attribute occurrence or a symbolic representation of a rule.
\end{defi}

\begin{defi}[Dependency rule]
A \emph{dependency rule} |d1 `pre` d2| is a rule that specifies that designator |d1| must be
scheduled before designator |d2|.
\end{defi}

\begin{defi}[Barrier attribute]
A \emph{barrier attribute} is an attribute that
can be used as a \emph{designator} (dependee/depender) in combination with
dependency rules. The value of a barrier attribute is implicitly defined.
\end{defi}

In this chapter, we first work out the concepts of barrier attributes and dependency rules
(Section~\ref{sect.warren.example}), and
then show how these concepts can be exploited to deal with phases (Section~\ref{sect.warren.phases})
and commutable rules (Section~\ref{warren.sect.commutting}).

\section{Example with Barriers}
\label{sect.warren.example}

\begin{figure}[tb]
\begin{code}
grammar Tree                                -- grammar for nonterm |Tree|
  | Leaf   x :: Int                         -- production |Leaf| with one terminal
  | Bin    l,r : Tree                       -- production |Bin| with two nonterminals

attr Tree  ^^^  syn gath  :: Int            -- attributes for nonterm |Tree|
                inh mini  :: Int
                syn repl  :: Tree

sem Tree                                    -- semantics for nonterm |Tree|
  | Leaf   lhs.gath  = loc.x
           lhs.repl  = Leaf lhs.mini        -- replacement tree
  | Bin    lhs.gath  = min l.gath r.gath    -- global minimum gathering
           l.mini    = r.gath               -- crossover between |r| and |l|
           r.mini    = l.gath
           lhs.repl  = Bin l.repl r.repl    -- replacement tree
\end{code}
\caption{A variant of the Repmin example.}
\label{fig.example.barrier}
\end{figure}

We use Haskell as host language.
Figure~\ref{fig.example.barrier} shows a variant of the classical Repmin example~\citep{springerlink:10.1007/BF00264249},
which requires more than one visit to compute the attributes with a statically ordered evaluation strategy.
The attribute |repl| contains a clone of the tree with each leaf-child replaced with the minimum of its sibling's subtree.
In order to get the |min| attribute of |l|, the |gath| attribute of |r| is needed and
vice versa. With statically ordered evaluation, this can be accomplished
by first computing the |gath| attribute in a first
visit, then compute |min| and |repl| in a later visit.

\paragraph{Barriers.}
The evaluations for |l.repl| and |r.repl| are independent. For debugging purposes, we may want
to specify that the |l.repl| attribute is computed before the |r.repl| attribute.
Since the |asyn.repl| attribute depends on the |ainh.min| attribute, we get the desired behavior
when |l.repl| is a dependency of |r.min|. The following dependency rule expresses this
dependency:
\begin{code}
sem Tree | Bin  ^^^  order  l.repl `pre` r.min
\end{code}
This rule requires that |ainh.min| is a dependency of |asyn.repl|. Note that, as usual, the
dependency relation is transitive.

The dependencies imposed by dependency rules may not be clear when multiple synthesized attributes are
involved, nor remain stable after code changes.
For this purpose, we introduce a barrier attribute:
\begin{code}
attr Tree  ^^^  inh sync barrier

sem Tree
  | Leaf   ^^^  order  lhs.sync  `pre` lhs.repl
  | Bin         order  lhs.sync  `pre` l.sync
                order  lhs.sync  `pre` r.sync
                order  l.repl    `pre` r.sync      -- actually subsumes |lhs.sync `pre` r.sync|
\end{code}
Barrier attributes are not defined by a rule. Instead, these attributes are
defined implicitly. However,
we require that for a synthesized barrier
attribute |y|, that there is at least one production with a dependency rule where |lhs.y| occurs
as RHS, and at least one production with a child |k| where |k.y| occurs as LHS in a dependency rule.
For an inherited barrier attribute |y|, we require the inverse.
There is actually no technical reason for this requirement:
if the requirement is not met, it should be taken as a warning that a barrier is not used.

\begin{defi}[Updatable attribute]
An updatable attribute is an attribute that has a value which is a reference into a shared
state. Via attribute occurrences (Figure~\ref{warren.fig.agcore}), we specify operations on this shared state.
An attribute occurrence |sup z trref| in expressions represents the value in the shared
state referenced by |z|. An occurrence |sup z trref| in a pattern means that the shared state
at the location pointed to by |z| is updated with the matched value.
An occurrence |sup z trnew| in a pattern means that a
new shared location is allocated with the matched value and a reference to it
stored in |z|.
\end{defi}

We incorporate side-effect in BarrierAG (Chapter~\rref{chapt.side-effect}) in the form of
updatable attributes. In the following example an updatable attribute |unq| is created, read from,
and updated:
\begin{code}
attr Tree  ^^^  inh unq :: IORef Int      -- inherited unique number dispenser

sem Root | Root
  sup (root.unq) trnew = 1                -- creates reference and assigns initial value

sem Tree | Leaf                           -- reads and updates reference
  (loc.myId, sup (alhs.unq) trref) = (sup (alhs.unq) trref, sup (alhs.unq) trref + 1)
\end{code}
Normally, attribute occurrences in a pattern are at defining positions.
However, when |sup z trref| is used to specify a store to a shared state, the attribute occurrence |z|
is at a usage position. Hence, |alhs.unq| refers to the inherited attribute |alhs.unq|, and
is a dependency of the rule.

Rules that write to a reference are guaranteed to be applied at most once. The order in which
these writes take place depends on the scheduling of attributes. If the attribute |loc.myId|
is not used, the write actually does not take place.
The only guarantee that is given for such a write is that the reference is created and initialized.
For more guarantees, barriers can be used to control and specify the order.

\paragraph{Remarks.}
Barriers and updatable references should be used sparingly.
In particular, rules with updatable references
are not functional, which breaks equational reasoning.
We present them here as an implementation
mechanism for phases and commutable rules.
With the later property, we can recover equational reasoning.
Note that the code generated from an AG may be functional even if the AG description itself is non-functional.

\begin{figure}[t]
\begin{code}
I      ::=  attr N (many a)                                    -- attr decls for nonterminal |N|
a      ::=  k x t                                              -- attribute declaration with form |k|, attr name |x| and type |t|
k      ::=  inh | syn                                          -- attribute forms (often also written as identifier)
t      ::=  :: tau                                             -- attribute type (host language)
       |    barrier                                            -- barrier type

s      ::=  (sub sem (^^ Gamma)) P : N (many r)                -- semantics of a production |P| of nonterm |N| in env |Gamma|
r      ::=  child x : N = f (many zup)                         -- (higher order) child declaration
       |    x : ^^ p (many zdown) = f (many zup)               -- evaluation rule named |x| with LHS |p| and RHS |f|
       |    order d1 `pre` d2                                  -- order declaration

d      ::=  child x                                            -- designates child |x|
       |    rule x                                             -- designates a rule named x
       |    z                                                  -- designates an attribute occurrence

zdown  ::=  sup z trval                                        -- store in occurrence |z|
       |    sup z trref                                        -- write to location referenced by |z|
       |    sup z trnew                                        -- store new handle in |z|

zup    ::=  sup z trval | sup z trref                          -- respectively read (closed), and deref (open)

z      ::=  h.c.x                                              -- occurrence attr |x| of |c| with form |h|
h      ::=  k | loc                                            -- attribute forms (extended with locals)
c      ::=  lhs | loc | x                                      -- child designators

x, f, p, P                                                     -- identifiers
\end{code}
\caption{AG core representation.}
\label{warren.fig.agcore}
\end{figure}


\section{Core Representation of AGs with Barriers}
\label{sect.warren.core}

In this section, we define the semantics of the dependency rules and barrier attributes using the
core language BarrierAG,
a desugared subset of higher-order AGs.
BarrierAG also permits attributes that
have references to a global state as value, which we need later.

\paragraph{Syntax.}
Figure~\ref{warren.fig.agcore} gives the syntax of BarrierAG. BarrierAG serves as a core language
for AGs. Hence, we assume that static checks, desugaring, item grouping, and copy rule insertion
have been performed (Section~\tref{intro.sect.incremental} and Section~\tref{intro.sect.copyrules}).
This core representation serves two purposes: it allows formal reasoning with AGs, and
specifying the construction of dependency graphs and execution plans (also see Section~\tref{sect.intro.ag.depgraphs}).

\begin{figure}[t]
\begin{code}
sem_Tree (Leaf x)   = sem_Leaf x
sem_Tree (Bin l r)  = sem_Bin (sem_Tree l) (sem_Tree r)

attr Tree  ^^  syn  gath  :: Int
               inh  mini  :: Int
               inh  sync  barrier
               syn  repl  :: Tree

sem_Leaf field_x =
  sem { f1 = field_x, f2 = Leaf } Leaf : Tree
    r0: ^^^  id ^^ occval (loc.loc.x)      ^^ = ^^ f1
    r1: ^^^  id ^^ occval (asyn.lhs.gath)  ^^ = ^^ id ^^ occval (loc.loc.x)
    r2: ^^^  id ^^ occval (asyn.lhs.repl)  ^^ = ^^ f2 ^^ occval (ainh.lhs.mini)
    order  ainh.lhs.sync  `pre` asyn.lhs.repl

sem_Bin field_l field_r =
  sem { f1 = field_l, f2 = field_r, f3=min, f4=Bin } Bin : Tree
    child l : Tree             = f1
    child r : Tree             = f2

    r3: ^^^  id ^^ occval (asyn.lhs.gath)  ^^ = ^^ f3 ^^ occval (asyn.l.gath) ^^ occval (asyn.r.gath)
    r4: ^^^  id ^^ occval (asyn.lhs.repl)  ^^ = ^^ f4 ^^ occval (asyn.l.repl) ^^ occval (asyn.r.repl)
    r5: ^^^  id ^^ occval (ainh.l.mini)    ^^ = ^^ id ^^ occval (asyn.r.gath)
    r6: ^^^  id ^^ occval (ainh.r.mini)    ^^ = ^^ id ^^ occval (asyn.l.gath)

    order  ainh.lhs.sync  `pre` ainh.l.sync
    order  ainh.lhs.sync  `pre` ainh.r.sync
    order  asyn.l.repl    `pre` ainh.r.sync

ast = sem_Bin (sem_Leaf 1) (sem_Leaf 2)
\end{code}
\caption{Desugared example.}
\label{warren.fig.agexpl}
\end{figure}

The main (meta) nonterminals in the (meta) grammar of Figure~\ref{warren.fig.agcore} are
|I| (attr-block) and |s| (semantics blocks).
An attr-block consists of a set of attribute declarations. A semantics block provides the
rules for a single production. The environment |Gamma| is often left implicit.

Figure~\ref{warren.fig.agexpl} shows a desugared version of the earlier example
in BarrierAG.
The context free grammar is translated to terms in the host language, and is not part of the core
language. The symbols of the production are represented as higher-order children and local attributes.
Evaluation rules are explicitly named, and consist of a function symbol |p| and |f|.
The evaluation rule represents the function |(many z1) = p (f (many z2))|. For example,
the rule |r3| represents the function |asyn.lhs.gath = id (f3 asyn.l.gath asyn.r.gath)|.
The definition of these functions are bound in the environment |Gamma|, which is an annotation
of the sem-block.
This representation has the advantage that the rules can be duplicated without duplicating the
body of the function.

%%% Operational semantics.
\paragraph{Semantics.}
Ultimately, we generate host-language code for the core language term. However, to facilitate reasoning
with terms in the core language, we first define an operational semantics\footnote{
  This operational semantics is also implemented as part of the @ruler-interpreter@:
  \url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/ruler-interpreter-0.1.tar.gz}.
} in Figure~\ref{warren.fig.nodeeval} which denotes a tree walking automaton (Section~\tref{sect.treewalkautomaton}).
The semantics refers to rules in Figure~\ref{warren.fig.readyrules}, which can be interpreted as a dynamic version of
the dependency analysis that we define in the next section or as a specification of a
tree-walking automaton (Section~\tref{sect.treewalkautomaton}).
Since we deal with higher-order AGs, the operational semantics describes both the construction and decoration
of the AST.

A decorated node of the AST is represented as a tuple |(N,P,Gamma,Heap)|, where |N| is the associated
nonterminal, |P| is the associated production, |Gamma| is an environment that contains
bindings for the functions that are mentioned in the rules of |P|, and the heap |Heap| contains values for
attributes and nodes that form the children of |P|.

More precisely, a heap |H| is a partial map between descriptors |d| and a value |v|.
A descriptor is either an identifier for an attribute or a child:
\begin{code}
v        ::=  atom | unit | n | ell                        -- atomic value |atom|, or unit value |unit|, or node |n|, or reference |ell|
n        ::=  (N,P,Gamma,H)                                -- node associated to |N| and |P|, with env |Gamma|, and heap |H|
H        ::=  emptyset | H,      d    ^ `dmap`  v          -- heap of a node (partial map from descriptor to value)
Psi      ::=  emptyset | Psi,    ell  ^ :->     v          -- threaded heap (partial map from reference to value)
Upsilon  ::=  emptyset | Upsilon, ell                      -- set of references |many ell|
\end{code}
A value of an attribute is either an atomic value |atom| in the domain of the attribute, or a tree in
case of a higher-order attribute.

\begin{figure}[t]
\begin{small}
\begin{mathpar}
  \fbox{|Gamma :- Psi0 `sep` v0 ---> v1 `sep` Psi1|}\\

  \inferrule*[right=e.descent]
    { |hlook H (child x) = v0| \\
      |Gamma0 :- Psi0 `sep` v0 ---> v1 `sep` Psi1|
    }
    {|Gamma0 :- Psi0 `sep` (N,P,Gamma1,H) ---> (N,P,Gamma1,{child x :-> v1} \/ H) `sep` Psi1|}

  \inferrule*[right=e.inh]
    {|hlook H (ainh.x.y) = v| \\
     |hlook H (child x) = (N',P',Gamma2,H')|\\
     |ready (N' `sep` P' `sep` H') (ainh.lhs.y)|
    }
    {|Gamma0 :- Psi `sep` (N,P,Gamma1,H) ---> (N,P,Gamma1,{child x :-> (N',P',Gamma2,{ainh.lhs.y :-> v} \+/ H')} \/ H) `sep` Psi|}

  \inferrule*[right=e.syn]
    {|hlook H (child x) = (N',P',Gamma2,H')|\\
     |hlook H' (asyn.lhs.y) = v|\\
     |ready (N `sep` P `sep` H) (asyn.x.y)| \\
    }
    {|Gamma0 :- Psi `sep` (N,P,Gamma1,H) ---> (N,P,Gamma1,{asyn.x.y :-> v} \+/ H) `sep` Psi|}

  \inferrule*[right=e.child]
    {
      |( child x : N' = f((many (sup z trup))) ) `elem` rules(N,P)| \\
      |ready (N `sep` P `sep` H) (child x)| \\
      |many (load (Psi `sep` H) (sup z trup) (v))| \\
    }
    {|Gamma0 :- Psi `sep` (N,P,Gamma1,H) ---> (N,P,Gamma1,{child x :-> (Gamma0 Gamma1 f)(many v)} \+/ H) `sep` Psi|}

  \inferrule*[right=e.eval]
    {
      |( x : ^^ p((many (sup z1 trdwn))) = f((many (sup z2 trup)))) `elem` rules(N,P)|\\
      |ready (N `sep` P `sep` H0) (rule x)| \\\\
      |many (load (Psi0 `sep` H0) (sup z2 trup) (v0))| \\
      |(many v1) = (Gamma0 Gamma1 p ^^ ^^ . ^^ ^^ Gamma0 Gamma1 f) (many v0)| \\
      |many (store (H0) (v1) (sup z1 trdwn) (H `sep` Psi `sep` Upsilon))| \\
      |sub Upsilon 1 \+/ ... \+/ sub Upsilon n \+/ dom Psi = (many ell)| \\
    }
    {|Gamma0 :- Psi0 `sep` (N,P,Gamma1,H0) ---> (N,P,Gamma1,(many H) \+/ {rule x :-> unit} \+/ H0) `sep` (many Psi) Psi0|}

  \inferrule*[right=e.bar.lhs]
    {
      |asyn.y `elem` barriers N| \\
      |ready (N `sep` P `sep` H) (asyn.lhs.y)| \\
    }
    {|Gamma0 :- Psi `sep` (N,P,Gamma1,H) ---> (N,P,Gamma1,{asyn.lhs.y :-> unit} \+/ H) `sep` Psi|}

  \inferrule*[right=e.bar.inh]
    {
      |( child x : N' = f((many (sup z trup))) ) `elem` rules(N,P)| \\
      |ainh.y `elem` barriers N'| \\
      |ready (N `sep` P `sep` H) (ainh.x.y)| \\
    }
    {|Gamma0 :- Psi `sep` (N,P,Gamma1,H) ---> (N,P,Gamma1,{ainh.x.y :-> unit} \+/ H) `sep` Psi|}
\end{mathpar}
\end{small}
\caption{Small-step evaluation rules for a node.}
\label{warren.fig.nodeeval}
\end{figure}

\begin{defi}[Normal form]
A tree |n| is in normal form when the synthesized attributes of the root have been computed, thus
when the root node |n| has a heap with bindings for all |asyn.lhs.x| where
|asyn.x| is a synthesized attribute declared for the nonterminal of |n|.
\end{defi}

The small-step relation |Gamma0 :- Psi `sep` v ---> v' `sep` Psi'| (Figure~\ref{warren.fig.nodeeval})
describes how to evaluate a tree |v| one step further to |v'| in a global environment |Gamma0|.
The threaded heap |Psi| contains bindings for references, and is threaded through the evaluation.

To reduce a tree, we start with an initial tree and gradually grow the tree by modifying the heaps.
In the initial tree |v0 = (N,P,Gamma,emptyset)|, |N| is the nonterminal of the
root, |P| is the production of the root, and |Gamma| contains bindings for functions as mentioned in the
rules of |P|. In particular, for each child-rule in |P|, there is a binding for a function in |Gamma| that
provides the semantics for that child.
When the AG is well-formed, the resulting tree is in normal form when the rules are applied exhaustively.

We use the following notation in the rules.
The operator |\/| represents the left-biassed union of its two operands heaps, and operator |\+/|
takes the union of two heaps with disjoined keys. The domain |dom(H)| of a heap |H| is the set of
the heap's keys. A heap |H| applied to a designator |d|, written |H(d)|, returns the binding for
|d| in |H|. Similarly, |Gamma0 Gamma1 f| returns the function binding for |f| in the
right-biassed union of |Gamma0| and |Gamma1|. Juxtaposition of heaps represents the left-biassed
union.

In Figure~\ref{warren.fig.nodeeval},
rule \TirName{e.decent} incorporates the idea of nondeterministically selecting a node
constructed so far, and reducing it a bit further. Each node
has its own heap that explicitly tracks which attributes have been computed, which children have been
constructed, and which rules have been applied. A test to check if a rule can be applied then boils down
to verifying that their dependencies are met (Figure~\ref{warren.fig.readyrules}).

With Rule \TirName{e.inh} bindings for the
inherited attribute of a child can be copied to the heap of that child when the child is ready to receive these bindings.
For example, inherited attributes for a child can be computed even before the child is introduced.
Rule \TirName{e.syn} represents the inverse for synthesized attributes.
Note the careful use of disjoined unions which ensure that a rule is only applicable once per
attribute.

With rules \TirName{e.child}, \TirName{e.eval}, \TirName{e.bar.inh}, and \TirName{e.bar.syn},
bindings can be added to the heap.
Evaluation of the RHS of the AG's child-rule gives
us the initial tree to use for that child in the heap. Evaluation of the AG's evaluation rule
results in bindings for one or more attributes. The rule \TirName{e.eval} ensures in addition
that potentially newly introduced references |Upsilon| do not clash with existing references.
The rule \TirName{e.bar.inh} defines an inherited barrier attribute of a child |x|. Rule
\TirName{e.bar.syn} defines a synthesized barrier attribute of |lhs|.

\begin{figure}[t]
\begin{small}
\begin{mathpar}
  \fbox{|load (Psi `sep` H) (sup z trup) v|}\\

  \inferrule*[right=l.val]
    { |hlook H z = v| \\
    }
    {|load (Psi `sep` H) (sup z trval) v|}

  \inferrule*[right=l.ref]
    {|hlook H z = ell| \\
     |plook Psi ell = v| \\
    }
    {|load (Psi `sep` H) (sup z trref) v|}
\end{mathpar}
\begin{mathpar}
  \fbox{|store (H0) v (sup z trdwn) (H `sep` Psi `sep` Upsilon)|}\\

  \inferrule*[right=s.new]
    {}
    {|store (H0) v (sup z trnew) ({ z :-> ell } `sep` { ell :-> v } `sep` {ell})|}

  \inferrule*[right=s.save]
    {}
    {|store (H0) v (sup z trval) ({ z :-> v } `sep` emptyset `sep` emptyset)|}

  \inferrule*[right=s.update]
    { |hlook H z = ell| }
    {|store (H0) v (sup z trref) (emptyset `sep` { ell :-> v } `sep` emptyset)|}
\end{mathpar}

\begin{mathpar}
  \fbox{|ready (N `sep` P `sep` H) (d)|}\\

  \inferrule*[right=d.order]
    { |readydep (N `sep` P `sep` H) (d)| \\
      |{ d' ^^ `bar` ^^ (order d' `pre` d) `elem` rules(N,P) } `subseteq` dom H| \\
    }
    {|ready (N `sep` P `sep` H) (d)|}

  \inferrule*[right=d.r]
    { |( x : ^^ p(many (sup z1 trdwn)) = f(many (sup z2 trup)) ) `elem` rules(N,P)|\\\\
      |many (ready (N `sep` P `sep` H) z1)| \\
    }
    {|readydep (N `sep` P `sep` H) (rule x)|}

  \inferrule*[right=d.c]
    {
    }
    {|readydep (N `sep` P `sep` H) (child x)|}

  \inferrule*[right=d.a]
    {
    }
    {|readydep (N `sep` P `sep` H) z|}
\end{mathpar}
\end{small}
\caption{Load, store and designator dependency rules.}
\label{warren.fig.readyrules}
\end{figure}

Figure~\ref{warren.fig.readyrules} shows load, store, and ready rules.
The relation |load (Psi `sep` H) (sup z trup) v| represents a load of value |v| using designator
|z| in heap |H|.
The load rule \TirName{l.val} requires the presence of a designator in the heap. With the
rule the value of the designator can be extracted from the head. The load rule \TirName{l.ref}
represents an indirect load from the threaded heap |Psi|.

The relation |store (H0) v z (H `sep` Psi `sep` Upsilon)| represents a store of value |v|.
The heap |H| and treaded heap |Psi| contain the new bindings that result from storing |v|.
The initial heap |H0| is
stores the reference when |sup z trdwn| is an indirection. The set of references |Upsilon|
contains the newly added references.
The store rule \TirName{s.new} represents the creation and assignment of a new reference |ell|,
and \TirName{s.update} an update via such a reference.
With rule \TirName{s.save} a binding for |z| is added without an indirection.

The relation |ready (N `sep` P `sep` H) (d)| describes when |d| is ready to be defined
while taking dependency-rules into account. These rules do not test for the existence
of values in the heap, because the load and store rules already cover for this.
The rule \TirName{d.order} handles dependency-rules generically for the designators.
The rule \TirName{d.r} states that an AG rule is ready when its LHS is ready.

\paragraph{Remarks.}
If we can determine for a nonterminal that no subtree applies rule \TirName{s.new} or
\TirName{s.update}, only topdown behavior for |Psi| is needed. If in addition \TirName{l.ref}
cannot occur, the two |Psi| parameters can be omitted from the relation.

The rule \TirName{e.descent} facilitates a walk down the tree until the node to reduce is reached.
An actual implementation performs multiple evaluation steps (if possible) on such a node in order to
reduce traversal overhead.
In fact, it is possible to statically analyze the AG and obtain a
static scheduling of the rules.

\section{Static Dependency Graphs}
\label{warren.sect.dependencies}

In order to obtain a static scheduling of the rules, we first construct dependency graphs
in the style of Knuth-1~\citep{DBLP:journals/mst/Knuth68}.
If these graphs are cycle free, a static scheduling of the rules is possible.

Unlike in Section~\tref{sect.intro.ag.depgraphs}, we mean an augmented PDG when we talk
about a PDG. When we talk about the initial PDG, we mean the non-augmented PDG as in
Section~\tref{sect.intro.ag.depgraphs}.
Thus, we will be dealing with dependency graphs per production (augmented production dependency graph |pdg(N,P)|) and
dependency graphs per nonterminal (a nonterminal dependency graph |ndg(N)|).

We first define the initial graphs, then specify the actual graphs as the fixpoints of the
functions:
%format initial = "\mathsf{init}"
\begin{code}
pdg N P  = sub(pdg)(initial) N P  <+> { instantiate x (ndg (sub N x) )  | x `elem` children P }
ndg N    = sub(ndg)(initial) N    <+> { abstract (pdg N P)              | P `elem` prods N }
\end{code}
The function |instantiate| translates the edges between declarations of |sub N c| in the NDG as edges
between the attributes of child |c| in the PDG. Similarly, when there is a path between two attributes of
|lhs| in the PDG, then |abstract| translates these as edges between the same attributes in |N|'s NDG.

\begin{figure}[tb]
%format vert1 = "\mathsf{vert}"
%format edge1 = "\mathsf{edge}"
%format pre1  = "\mathsf{pre}"
%format suc1  = "\mathsf{succ}"
\begin{code}
semb (sem P : N ^^ (many r)) (vert1)                                      = {child lhs} <+> semb (sub (many a) N) (vert1(lhs)) <+> semb (many r) (vert1)

semb (inh y t) (vert1(x))                                                 = {ainh.x.y}
semb (syn y t) (vert1(x))                                                 = {asyn.x.y}

semb (child x : N = f (many zup)) vert1                                   = {child x} <+> semb (sub (many a) N) (vert1(x))
semb (x : ^^ p (many (sup z1 trdwn)) = f (many (sup z2 trup))) vert1      = {rule x}
semb (order d1 `pre` d2) vert1                                            = emptyset
\end{code}
\begin{code}
semb (sem P : N ^^ (many r)) (edge1)                                    = semb (many r) (edge1)

semb (child x : N = f (many (sup z trup))) edge1                        = many (z <- child x)  <+> { child x <- syn.x.y  | syn.y `elem` sub a N }
semb (x : ^^ p(many (sup z1 trdwn)) = f(many (sup z2 trup))) edge1      = many (z2 <- rule x)  <+> semb (sup z1 trdwn) (pre1(x))
semb (order d1 `pre` d2)                                                = {d1 <- d2}

semb (sup z trval) (pre1(x))                                            = { rule x <- z }
semb (sup z trnew) (pre1(x))                                            = { rule x <- z }
semb (sup z trref) (pre1(x))                                            = { z <- rule x }
\end{code}
\caption{Vertices and edges of the initial production dependency graph.}
\label{fig.warren.initial}
\end{figure}

\paragraph{Initial production dependency graph.}


Per production |P| of nonterminal |N|, we construct a production dependency graph |(sub pdg initial) N P|.
Thus, each sem-block is translated to a PDG.
Figure~\ref{fig.warren.initial} shows that the vertices of |(sub pdg initial) N P| consists of designators
|semb (sem P : N ^^ (many r)) (vert1)|.
In this notation, |sub (many a) N| is the set of attribute declarations |a| of nonterminal |N|.
The directed edges |semb (sem P : N (many r)) (edge1)| of |pdg(N,P)| relate designators.
An edge from |d2| to |d1| denotes that |d1| must be defined before |d2|.
Order rules are thus a means to arbitrarily add edges to the PDG.

The initial PDG does not contain the dependencies imposed by the semantics of each child
(e.g. the actual tree). We obtain the
actual production dependency graph by augmenting it with the edges taken from the
nonterminal dependency graphs of the children which are an approximation of the dependencies of
such trees.

\paragraph{Initial nonterminal dependency graph.}
The vertices of the nondeterminal dependency graph |ndg(N)| are the
declarations |a| of |N|:
\begin{code}
semb (attr N (many a)) (vert1)  = semb (many a) (vert1)
semb (inh x t) (vert1)          = { ainh.x }
semb (syn x t) (vert1)          = { asyn.x }
\end{code}
The initial NDG does not have any edges. These edges are inferred from the PDGs, as we see
below.

\paragraph{Graph construction.}
The actual PDG is the least solution to the above equations with the following definitions
for |abstract| and |instantiate|. The relation |d1 <-+ d2| represents a path from |d2| to |d1|,
with |d1 /= d2|:
\begin{code}
abstract g       =  { k1.x    <- k2.y    | k1.lhs.x  <-+  k2.lhs.y  `elem` g }
instantiate x g  =  { k1.x.y  <- k2.x.z  | k1.y      <-+  k2.z      `elem` g }
\end{code}
Conventionally, a synthesized attribute of a child only depends on an inherited attribute
of a child. However, due to dependency-rules, any attribute of a child can potentially depend
on any other attribute of the child, hence our definition of |abstract| and |instantiate|
take these also into account.

The construction of the graphs is a relatively straightforward fixpoint computation. As
intermediate data structure, a transitively closed PDG allows for efficient tests for
paths between vertices.

\begin{figure}[tb]
\begin{center}
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\scriptsize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , lab/.style={font=\scriptsize}
  ]

  \node[nd](lhsmini){|lhs.mini|};
  \node[nd, right of=lhsmini, xshift=30mm](lhsgath){|lhs.gath|};
  \node[nd, right of=lhsgath, xshift=10mm](lhsrepl){|lhs.repl|};

  \node[nd, below of=lhsmini, yshift=-6mm](l){|child l|};
  \node[nd, below of=lhsrepl, yshift=-6mm](r){|child r|};

  \node[nd, below of=l, xshift=-20mm](lmini){|l.mini|};
  \node[nd, right of=lmini, xshift=10mm](lgath){|l.gath|};
  \node[nd, right of=lgath, xshift=10mm](lrepl){|l.repl|};

  \node[nd, below of=r, xshift=-20mm](rmini){|r.mini|};
  \node[nd, right of=rmini, xshift=10mm](rgath){|r.gath|};
  \node[nd, right of=rgath, xshift=10mm](rrepl){|r.repl|};

  \node[nd, below of=lhsgath, xshift=-10mm, yshift=3mm](r3){|r3|};
  \node[nd, below of=lhsgath, xshift=25mm, yshift=3mm](r4){|r4|};
  \node[nd, below of=lrepl, xshift=-6mm](r5){|r5|};
  \node[nd, below of=lrepl, yshift=5mm](r6){|r6|};

  \draw[dep](r3) to (lgath);
  \draw[dep](r3) to (rgath);
  \draw[dep](lhsgath) to (r3);
  \draw[dep](r4) to (lrepl);
  \draw[dep](r4) to (rrepl);
  \draw[dep](lhsrepl) to (r4);
  \draw[dep](r5) to [in=-155,out=0] (rgath);
  \draw[dep](lmini) to [in=180,out=-35] (r5);
  \draw[dep](r6) to (lgath);
  \draw[dep](rmini) to (r6);
  \draw[dep](rgath) to (r);
  \draw[dep](rrepl) to (r);
  \draw[dep](lgath) to (l);
  \draw[dep](lrepl) to (l);

  \draw[dep2](lrepl) to [out=-155,in=-35] (lmini);
  \draw[dep2](rrepl) to [out=-155,in=-35] (rmini);
\end{tikzpicture}
\end{center}
\caption{Example of the (augmented) PDG of the Bin-production.}
\label{fig.warren.binpdg}
\end{figure}

\paragraph{Example.}
In the Leaf-production, there is dependency of |asyn.lhs.repl| on |ainh.lhs.mini|
via rule |r2|. This thus induces a dependency of |asyn.repl| on |ainh.mini| in the NDG.
Figure~\ref{fig.warren.binpdg} shows the PDG of the production |Bin|. The solid edges are initial
edges, and the dashed edges are instantiated from the NDG. The Bin-production by itself
does not induce any edges in the NDG.
We do not show the PDG of the Leaf-production here; it is shown in Figure~\ref{warren.fig.vdpds.g1}.

\paragraph{Properties.}
A NDG can only be cyclic if at least one of the PDGs is cyclic. The reason is that the edges of
the NDG are projected into the PDGs, thus a PDG must have a subgraph with these cyclic edges.

When the PDGs are acyclic, then an evaluation to normal form is possible for any well-formed
tree. In other words, the graph construction is sound.
This property has a relatively straightforward structural induction proof on the shape of the
tree. The NDGs symbolize the induction hypothesis.
When an evaluation algorithm respects the dependencies of the PDGs, and the constraints of
each rule are preserved, the evaluation algorithm is sound.
Moreover, the evaluation algorithm is complete if it finds a scheduling when the PDGs are
acyclic.
We show such an algorithm in Section~\ref{sect.warren.visitsgraphs}.

Rules of a production may make an inherited attribute of a child dependent on a synthesized
attribute of a child. These dependencies are not part of the NDG. When we add these edges
also to the NDG, each occurrence of a nonterminal has the same dependencies of its
inherited attributes on its synthesized attributes.
When the PDGs are cycle free without these edges, they are typically also cycle free with
these additional edges. In the next section, it becomes clear that adding these edges may reduce
the number of visit interfaces that a nonterminal symbol needs to support, thus can be beneficial
when the code size is an issue.


\section{Visits Graphs}
\label{sect.warren.visitsgraphs}

An interface for a nonterminal is a partitioning of the attributes into a finite sequence of visits.
Given such a sequence, we can produce an execution plan and generate code, as we showed in
Chapter~\rref{chapt.side-effect}. We derive such interfaces from acyclic PDGs.

Given such a sequence, it induces a dependency of all attributes of a later visit on attributes
of an earlier visit. When represented as scheduling-induced edges in the PDG, it may make the
PDG cyclic.
It is typically possible to define a single visit sequence per nonterminal that does not
make the PDG cyclic. However, it is generally hard to automatically find such a
sequence~\citep{DBLP:journals/acta/Kastens80} when the sequence is not manually specified
as we presented in Chapter~\rref{chapt.side-effect}.
Instead, if we allow multiple interfaces for a nonterminal, we can define visit sequences so
that they do not lead to additional dependencies. This is the approach taken by \citet{DBLP:conf/popl/KennedyW76}.

In our experience, Kastens' approach requires severe manual intervention for large AGs,
thus we take the approach by \citet{DBLP:conf/popl/KennedyW76}.
However, it is not immediately obvious how to deal with multiple interfaces in a strongly
typed language, because the type of the semantic function depends on the interface. In the
next section, we solve this problem using type indices in combination with GADTs.
In this section, we show how to determine these interfaces.

\paragraph{Contexts.}
A \emph{context} |ctx| of a nonterminal |N|, or nonterminal symbol with |N|, or a production of |N|
is a visit sequence (many v) that is consistent with |ndg N|:
\begin{code}
ctx  ::= (many v)              -- a context consists of a sequence of visits |many v|
v    ::= visit i (many (k.x))  -- a visit consists of a set of attributes |many (k.v)| and globally unique |i|
i                              -- identifier
\end{code}
A context contains a subset of the attributes declared for a nonterminal.
A nonterminal may have a context with less attributes when not all attributes
of a nonterminal symbol with this nonterminal are needed.

A visit sequence is consistent with |ndg N| when the following conditions are met.
\begin{itemize}
\item For each attribute |ainh.x| and |asyn.y| in the sequence,
      with |ainh.x <-+ asyn.y| in |ndg N|, either
      |asyn.y| is not part of the visit sequence ,or
      |asyn.y| is in the same or later visit as |ainh.x|.
\item For each attribute |asyn.x| and |ainh.y| in the sequence,
      with |asyn.x <-+ ainh.y| in |ndg N|, either
      |ainh.y| is not part of the visit sequence, or
      |ainh.y| is in a later visit as |asyn.x|.
\end{itemize}
The number of consistent visit sequences is finite, but potentially large, as it is in the
worst case exponential in the number of attributes.


\begin{figure}
\begin{code}
semb (visit i (many (k.y))) (vert1(x))  =  { visit x i }
semb (visit i (many (k.y))) (edge1(x))  =  semb (i) (edge1(x,pre i)) <+> semb (many (k.y)) (edge1(x,i))

semb (i) (edge1(x,epsilon))             =  { child x }
semb (i) (edge1(x,j))                   =  { visit x j  <- visit x i }

semb (ainh.y) (edge1(x,i))              =  { ainh.x.y   <- visit x i }
semb (asyn.y) (edge1(x,i))              =  { visit x i  <- asyn.x.y }
\end{code}
\caption{The embedding of a context.}
\label{fig.warren.embedding}
\end{figure}

\paragraph{Embedding of a context.}
We make the context of a child visible in the PDG. For that purpose, we introduce a
vertex form that represents a visit:
\begin{code}
d  ::=  ...
   |    visit x i    -- a visit to |x| identified by |i|
\end{code}
For a child |x| in context |sub C x|, we add additional vertices and edges to the PDG, which we
call the \emph{embedding} of |sub C x| of child |x| in the PDG.
Informally, to embed a context, we add a visit-node and add edges between this node and
the associated inherited and synthesized attributes. Additionally, a visit node is dependent on
the node of the preceding visit, if any, and the child-node otherwise.
Formally, let |pre i| either be the preceding visit |j| of |i|, or |epsilon| when |i| is
the first visit. We add vertices |semb C (vert1(x))| and
edges |semb C (edge1(x))| as described in Figure~\ref{fig.warren.embedding}.

\begin{figure}[t]
\centering

\subfigure[Context |ctx1| with all attributes.]{
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\scriptsize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , lab/.style={font=\scriptsize}
  ]

  \node[nd](v1){|visit v1|};
  \node[nd, right of=v1, xshift=8mm](v2){|visit v2|};
  \node[nd, above of=v2, xshift=5mm](lmini){|ainh.l.mini|};
  \node[nd, below of=v1, xshift=-5mm](lgath){|asyn.l.gath|};
  \node[nd, below of=v2, xshift=10mm](lrepl){|asyn.l.repl|};
  \node[nd, below of=lgath, xshift=18mm, yshift=7mm](l){|child l|};

  \draw[dep](v2) to (v1);
  \draw[dep](v2) to (lmini);
  \draw[dep](lgath) to (v1);
  \draw[dep](lrepl) to (v2);
  \draw[dep2](lrepl) to [out=55,in=-55] (lmini);
  \draw[dep2](v1) to (l);
  \draw[dep](lgath) to (l);
  \draw[dep](lrepl) to (l);
\end{tikzpicture}
\label{fig.warren.exampleinst1}
} \hspace{4em} \subfigure[Context |ctx2| with only |asyn.gath|.]{
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\scriptsize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , lab/.style={font=\scriptsize}
  ]

  \node[nd](v3){|visit v3|};
  \node[nd, above of=v3, xshift=22mm](lmini){|ainh.l.mini|};
  \node[nd, below of=v3, xshift=-5mm](lgath){|asyn.l.gath|};
  \node[nd, below of=v3, xshift=27mm](lrepl){|asyn.l.repl|};
  \node[nd, below of=lgath, xshift=18mm, yshift=7mm](l){|child l|};

  \draw[dep](lgath) to (v3);
  \draw[dep2](lrepl) to [out=38,in=-38] (lmini);
  \draw[dep](lgath) to (l);
  \draw[dep](lrepl) to (l);
  \draw[dep2](v3) to (l);
\end{tikzpicture}
\label{fig.warren.exampleinst2}
}

\caption{Examples of visit-additions to the PDG.}
\label{fig.warren.exampleinsts}
\end{figure}

\begin{defi}[Visit PDG]
For some given contexts of children,
a \emph{visit PDG} (VPDG) is the embedding of these contexts in the PDG.
\end{defi}

The following are exemplary contexts of child |l| of production |Bin|. The context |ctx1| describes
the computation of all attributes in two phases. The context |ctx2| only the computation of
|asyn.gath|:
\begin{code}
ctx1  = [ visit v1 {asyn.gath}, visit v2 {ainh.mini,asyn.repl} ]
ctx2  = [ visit v3 {asyn.gath} ]
ctx3  = [ visit v4 {asyn.gath, ainh.mini, asyn.repl} ]
\end{code}
Figure~\ref{fig.warren.exampleinsts} shows the relevant subgraphs after embedding the
respective contexts |ctx1| and |ctx2|. The dashed edges are already existing edges.
The embedding of |ctx3| would lead to a cycle in the PDG.

A visit-node in a VPDG serves as bookkeeping that a visit to the child is needed to
compute the attributes it is associated with.
It represents dependencies induced from the AG-wide scheduling of
attribute computations, and we will ensure later that each visit node has an associated
algorithm that computes the attributes.

\paragraph{Abstraction of visit-nodes.}
We see later as well that the bookkeeping with visits is too verbose, so we establish a more compact
representation. For that purpose, we change the syntax of child-vertices. These
additionally define in what
state |many a| the child is, which can be used in the graph to \emph{abstract} from a sequence
of visits:
\begin{code}
d  ::=  ...
   |    child x (many a)    -- a child |x| in state |many a| (initially |emptyset|)
\end{code}
Let |g| be an acyclic VPDG.
Figure~\ref{fig.warren.abstraction} shows how to abstract from the visit-vertices.
We first remove all visit-vertices. When a visit-vertex |i| is removed, its
associated attributes |sub (many a) i| are added to the child-vertex.
There are no visit-vertices left in |g| after |abstract| has been applied.

\begin{figure}[tb]
\begin{code}
abstract g
  | <% invoke x i %> `elem` g && <% child x (many a) <- invoke x i %> `elem` g =
      let  v   = <% child x (many a \+/ sub (many a) i) %>
           es  =  { v <- asyn.x.y | asyn.x.y `elem` g } \/
                  { v <- invoke x j | <% invoke x j %> `elem` g, <% invoke x i <- invoke x j %> `elem` g }
      in   abstract ((g \/ {v} \/ es) - { invoke x i, child x (many a) } )
  | otherwise      = g
\end{code}
\caption{Abstraction on a VPDG.}
\label{fig.warren.abstraction}
\end{figure}

With abstraction applied to a VPDG we obtain scheduled PDGs:
\begin{defi}[Scheduled PDG]
Given a production |P| in context |ctx|, and context |many ctx| for its children,
its VPDG |g| is a \emph{scheduled PDG} (SPDG) when:
\begin{itemize}
\item The VPDG |g| is acyclic.
\item All reachable synthesized attributes of a child depend on a child node that
      includes the synthesized attribute in its state.
      Thus, for each |asyn.y `elem` ctx|, for each path in the VPDG from
      vertex |asyn.lhs.y| to a vertex |asyn.x.z|, there
      exists a vertex |child x (many a)| with |asyn.z `elem` (many a)|.
\end{itemize}
\end{defi}
These two conditions allow us to gradually determine contexts, as we show in the remainder
of this section. We show in the next section how to convert a SPDG to an execution plan for a
production.

\paragraph{Representation.}
We determine a set of contexts for each nonterminal, such that there
exists a SPDG for each context and each production. Preferably, the set of contexts is small,
because each context requires a different semantic function. Also, to save traversal overhead,
preferably each context contains visits with many attributes.
To determine this set, we use a more sophisticated representation, the visits graph, which we
define below.

Note again that a context is a sequence of visits that is associated with a nonterminal |N|,
where each visit describes a transition of the configuration of a node associated with |N|.
Given a set of contexts, sequences of visits may have common intermediate configurations.
Thus, we can represent a set of contexts as a graph,
which we call the visits graph:

\begin{defi}[Visits graph]
Given a set of contexts |many ctx|, a \emph{visits-graph} is DAG where
the vertices are the union of |semb ctx vert1| of each |ctx `elem` many ctx|,
and the edges are the union of |semb ctx edge1| of each |ctx `elem` many ctx|,
as described by Figure~\ref{fig.warren.visitgraph}.
The vertices represent configurations, and the
edges represent visits.
\end{defi}

\begin{figure}[tb]
\begin{code}
semb (ctx) (vert1)                       = semb ctx (vert1(emptyset))
semb (emptyset) (vert1(s))               = {s}
semb (visit i (many a) : vs) (vert1(s))  = {s} <+> semb vs (vert1(s <+> many a))

semb (ctx) (edge1)                       = semb ctx (edge1(emptyset))
semb (emptyset) (edge1(s))               = emptyset
semb (visit i (many a) : vs) (edge1(s))  = {s (over(i)(->)) (s <+> many a) } <+> semb vs (edge1(s <+> many a))
\end{code}
\caption{Vertices and edges of the visits-graph.}
\label{fig.warren.visitgraph}
\end{figure}

\begin{figure}[tb]
\begin{center}
\begin{tikzpicture}
  [ conf/.style={circle, minimum size=1mm, node distance=4mm, top color=black, bottom color=black,font=\scriptsize}
  , vis/.style={->,thick}
  , lab/.style={font=\footnotesize}
  ]

\node[conf](c1){};
\node[conf, right of=c1, xshift=30mm](c2){};
\node[conf, right of=c1, xshift=10mm, yshift=-10mm](c3){};

\node[lab,left of=c1, xshift=2mm]{|s0 = emptyset|};
\node[lab,right of=c2, xshift=14mm]{|s2 = { ainh.mini,asyn.gath,asyn.repl }|};
\node[lab, right of=c3, xshift=3mm]{|s1 = { asyn.gath }|};

\draw[vis](c1) to node[auto,above,lab]{|v3|} (c2);
\draw[vis](c1) to node[auto,above,lab]{|v1|} (c3);
\draw[vis](c3) to node[auto,above,lab]{|v2|} (c2);

\end{tikzpicture}
\end{center}
\caption{Example of a visits-graph which represents contexts |ctx1|, |ctx2| and |ctx3|.}
\label{fig.warren.visitsgraphexample}
\end{figure}
Figure~\ref{fig.warren.visitsgraphexample} shows the visits-graph that
represents the contexts |ctx1|, |ctx2| and |ctx3|.
Note that the context |ctx2| is fully implied by the other contexts.

For a generated algorithm, a path in the visits-graph represents how a node is visited by
its parent. However, a parent may stop invoking visits after any visit. From a code generation perspective,
it may be beneficial to know at which vertex a parent stops. We can model this situation in
the graph, but we refrain from this complication here.

When two paths in the visits graph converge, this means that two different visit sequences
ended up in the same configuration. Consequently, the children are in the same configuration,
but were potentially subjected to different visit sequences. The VPDGs may thus differ in the
visit-vertices, therefore we need |abstract| (as defined earlier) to eliminate this difference.

\begin{figure}[tb]
\begin{code}
s  ::=  conf N (many a) (many p)          -- configuration of a node with nonterminal |N| (key |many a|)
p  ::=  prod P (many k) g                 -- production in a context of nonterminal |N|
k  ::=  child x : N (many a)              -- configuration of child |x| (key |many a|)

e  ::=  s `sep` v `sep` s'                -- edge between |s| and |s'|
v  ::=  visit i : N (many a) (many c)     -- visit |i| with production info |many c| (key |i|, non-empty |many a|)
c  ::=  prod P (many r)                   -- sequence of invocation sets |many r| (possibly empty)
r  ::=  sim (many m)                      -- simultaneous invocations |many m| (non-empty, unordered)
m  ::=  invoke x i                        -- invocation of visit |i| to |x| (key |i|)

g                                         -- VPDG of a production in a given context
N, P                                      -- identifier (nonterminal), identifier (production)
\end{code}
\caption{Notation for visits-graphs.}
\label{fig.warren.visitsgraphnotation}
\end{figure}

We keep a more complex administration for the visits graph,
with vertices of the form |s| and
edges of the form |v|, as described by Figure~\ref{fig.warren.visitsgraphnotation}.
We explain some aspects of this notation below.
Figure~\ref{warren.fig.visitsgraphexmpl} shows this administration
for the example.

A vertex of the visits graph represents a context. Per production, a vertex stores the VPDG in
this context, and the states of the children. An edge represents a visit, which is a
transition from one configuration to one of its next configurations. It stores the attributes |many a|, which are
the inherited attributes provided by the parent for the visit, and the synthesized attributes that
need to be computed for the parent. Per production, an edge stores an ordered sequence
of invocations to children that are required for the visit. These visits are grouped in what we
call \emph{simultaneous visits}. Visits of such a group visit children independently. Note that these
visits may be invoked in any order, and may also be invoked concurrently.

Figure~\ref{warren.fig.vdpds.g1} displays the graphs |g1|, |g2|, |g3| and |g4| that are mentioned
in Figure~\rref{warren.fig.visitsgraphexmpl}. The operation |abstract| is not yet applied in the graphs.
The dotted edges represent the edges that are inserted due to the embedding of visits.
An important property of such a graph is that the simultaneous visits contain precisely those
synthesized attributes that can be visited because the inherited attributes are available
(described below with the relation |avail|),
and also need to be visited in order to produce values for the synthesized attributes.

The example is rather symmetric. The children of the production |Bin| are treated in the same
way. This is in general not the case and will be handled correctly.
For example, children may be of different nonterminals, or may
be visited in a different order. This situation arises when we use
|<% invoke l 3 %>| for the left child of the bin-production, or in the
example of Section~\tref{summary.sect.commuting}.

\begin{figure}[t]
\begin{code}
s0  =  conf Tree emptyset { p1, p2 }
s1  =  conf Tree { asyn.gath } { p3, p4 }
s2  =  conf Tree { ainh.mini, asyn.gath, asyn.repl } { p5, p6 }

p1  = prod Leaf  emptyset  g1
p3  = prod Leaf  emptyset  g1
p5  = prod Leaf  emptyset  g1
p2  = prod Bin   {  child l : Tree emptyset, child r : Tree emptyset }            ^^   (abstract g2)
p4  = prod Bin   {  child l : Tree { asyn.gath }, child r : Tree { asyn.gath } }  ^^   (abstract g3)
p6  = prod Bin   {  child l : Tree { ainh.mini, asyn.gath, asyn.repl }
                 ,  child r : Tree { ainh.mini, asyn.gath, asyn.repl } }          ^^   (abstract g4)

v1  = visit 1 : Tree  { asyn.gath }                        { c1, c2 }
v2  = visit 2 : Tree  { ainh.mini, asyn.repl }             { c3, c4 }
v3  = visit 3 : Tree  { ainh.mini, asyn.gath, asyn.repl }  { c5, c6 }

c1  = prod Leaf  emptyset
c3  = prod Leaf  emptyset
c5  = prod Leaf  emptyset
c2  = prod Bin   { sim { invoke l 1, invoke r 1 } }
c4  = prod Bin   { sim { invoke l 2, invoke r 2 } }
c6  = prod Bin   { sim { invoke l 1, invoke r 1 }, sim { invoke l 2, invoke r 2 } }
\end{code}
\caption{The contents of nodes and vertices of the visits graph of the example.}
\label{warren.fig.visitsgraphexmpl}
\end{figure}

\begin{figure}[p]
\centering
\subfigure[VPDG |g1| of production |Leaf|, which has no children.]{
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\scriptsize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , lab/.style={font=\scriptsize}
  ]

  \node[nd](lhsgath){|asyn.lhs.gath|};
  \node[nd, right of=lhsgath, xshift=30mm](lhsmini){|ainh.lhs.mini|};
  \node[nd, right of=lhsmini, xshift=10mm](lhsrepl){|asyn.lhs.repl|};
  \node[nd, below of=lhsgath, yshift=4mm](r1){|r1|};
  \node[nd, right of=r1, xshift=4mm](locx){|loc.loc.x|};
  \node[nd, right of=locx, xshift=4mm](r0){|r0|};
  \node[nd, right of=r0, xshift=7mm](r2){|r2|};

  \draw[dep](lhsgath) to (r1);
  \draw[dep](r1) to (locx);
  \draw[dep](locx) to (r0);
  \draw[dep](r2) to (lhsmini);
  \draw[dep](lhsrepl) to (r2);
\end{tikzpicture}
\label{warren.fig.vdpds.g1}
}

\subfigure[VPDG |g2| of production |Bin|, which has children |l| and |r|.]{
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\scriptsize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , lab/.style={font=\scriptsize}
  ]

  \node[nd](lhsmini){|ainh.lhs.mini|};
  \node[nd, right of=lhsmini, xshift=30mm](lhsgath){|asyn.lhs.gath|};
  \node[nd, right of=lhsgath, xshift=10mm](lhsrepl){|asyn.lhs.repl|};

  \node[nd, below of=lhsmini, yshift=3mm](l){|child l emptyset|};
  \node[nd, below of=lhsrepl, yshift=-3mm](r){|child r emptyset|};

  \node[nd, below of=l, xshift=-20mm, yshift=-4mm](lgath){|asyn.l.gath|};
  \node[nd, right of=lgath, xshift=10mm](lmini){|ainh.l.mini|};
  \node[nd, right of=lmini, xshift=10mm](lrepl){|asyn.l.repl|};

  \node[nd, below of=r, xshift=-20mm, yshift=2mm](rgath){|asyn.r.gath|};
  \node[nd, right of=rgath, xshift=5mm, yshift=-8mm](rmini){|ainh.r.mini|};
  \node[nd, right of=rmini, xshift=15mm, yshift=6mm](rrepl){|asyn.r.repl|};

  \node[nd, below of=lhsgath, xshift=-10mm, yshift=4mm](r3){|r3|};
  \node[nd, below of=lhsgath, xshift=25mm, yshift=4mm](r4){|r4|};
  \node[nd, below of=lrepl, xshift=7mm](r6){|r6|};
  \node[nd, below of=lrepl, yshift=5mm](r5){|r5|};

  \draw[dep](r3) to (lgath);
  \draw[dep](r3) to (rgath);
  \draw[dep](lhsgath) to (r3);
  \draw[dep](r4) to (lrepl);
  \draw[dep](r4) to (rrepl);
  \draw[dep](lhsrepl) to (r4);
  \draw[dep](r5) to (rgath);
  \draw[dep](lmini) to (r5);
  \draw[dep](r6) to (lgath);
  \draw[dep](rmini) to (r6);
  \draw[dep](rgath) to (r);
  \draw[dep](rrepl) to (r);
  \draw[dep](lgath) to (l);
  \draw[dep](lrepl) to (l);
  \draw[dep](lrepl) to (lmini);
  \draw[dep](rrepl) to (rmini);
\end{tikzpicture}
\label{warren.fig.vdpds.g2}
}

\subfigure[VPDG |g3| after visit |v1| in which it visited |v1| of the children.]{
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\scriptsize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , lab/.style={font=\scriptsize}
  ]

  \node[nd](lhsmini){|ainh.lhs.mini|};
  \node[nd, right of=lhsmini, xshift=30mm](lhsgath){|asyn.lhs.gath|};
  \node[nd, right of=lhsgath, xshift=10mm](lhsrepl){|ainh.lhs.repl|};

  \node[nd, below of=lhsmini, yshift=3mm](l){|child l emptyset|};
  \node[nd, below of=lhsrepl, yshift=-3mm](r){|child r emptyset|};

  \node[nd, below of=l, xshift=-20mm, yshift=-4mm](lgath){|asyn.l.gath|};
  \node[nd, right of=lgath, xshift=10mm](lmini){|ainh.l.mini|};
  \node[nd, right of=lmini, xshift=10mm](lrepl){|asyn.l.repl|};

  \node[nd, below of=r, xshift=-20mm, yshift=2mm](rgath){|asyn.r.gath|};
  \node[nd, right of=rgath, xshift=5mm, yshift=-8mm](rmini){|ainh.r.mini|};
  \node[nd, right of=rmini, xshift=15mm, yshift=6mm](rrepl){|asyn.r.repl|};

  \node[nd, below of=lhsgath, xshift=-10mm, yshift=4mm](r3){|r3|};
  \node[nd, below of=lhsgath, xshift=25mm, yshift=4mm](r4){|r4|};
  \node[nd, below of=lrepl, xshift=7mm](r6){|r6|};
  \node[nd, below of=lrepl, yshift=5mm](r5){|r5|};

  \node[nd, above of=lgath](l1){|invoke l 1|};
  \node[nd, right of=rgath, xshift=10mm](r1){|invoke r 1|};
  \draw[dep2](lgath) to (l1);
  \draw[dep2](rgath) to (r1);
  \draw[dep2](l1) to (l);
  \draw[dep2](r1) to (r);

  \draw[dep](r3) to (lgath);
  \draw[dep](r3) to (rgath);
  \draw[dep](lhsgath) to (r3);
  \draw[dep](r4) to (lrepl);
  \draw[dep](r4) to (rrepl);
  \draw[dep](lhsrepl) to (r4);
  \draw[dep](r5) to (rgath);
  \draw[dep](lmini) to (r5);
  \draw[dep](r6) to (lgath);
  \draw[dep](rmini) to (r6);
  \draw[dep](rgath) to (r);
  \draw[dep](rrepl) to (r);
  \draw[dep](lgath) to (l);
  \draw[dep](lrepl) to (l);
  \draw[dep](lrepl) to (lmini);
  \draw[dep](rrepl) to (rmini);
\end{tikzpicture}
\label{warren.fig.vdpds.g3}
}

\subfigure[VPDG |g4| after visit |v2| or |v3| in which it visited |v2| and possibly |v1| of the children.]{
\begin{tikzpicture}
  [ nd/.style={ellipse, draw, minimum size=4mm, node distance=12mm,font=\scriptsize}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , lab/.style={font=\scriptsize}
  ]

  \node[nd](lhsmini){|ainh.lhs.mini|};
  \node[nd, right of=lhsmini, xshift=30mm](lhsgath){|asyn.lhs.gath|};
  \node[nd, right of=lhsgath, xshift=10mm](lhsrepl){|asyn.lhs.repl|};

  \node[nd, below of=lhsmini, yshift=1mm](l){$\begin{array}{c}|child l|\\|{asyn.gath}|\end{array}$};
  \node[nd, below of=lhsrepl, yshift=-7mm](r){$\begin{array}{c}|child r|\\|{asyn.gath}|\end{array}$};

  \node[nd, below of=l, xshift=-20mm, yshift=0mm](lgath){|asyn.l.gath|};
  \node[nd, right of=lgath, xshift=10mm](lmini){|ainh.l.mini|};
  \node[nd, right of=lmini, xshift=10mm](lrepl){|asyn.l.repl|};

  \node[nd, below of=r, xshift=-20mm, yshift=5mm](rgath){|asyn.r.gath|};
  \node[nd, right of=rgath, xshift=5mm, yshift=-8mm](rmini){|ainh.r.mini|};
  \node[nd, right of=rmini, xshift=15mm, yshift=6mm](rrepl){|asyn.r.repl|};

  \node[nd, below of=lhsgath, xshift=-10mm, yshift=1mm](r3){|r3|};
  \node[nd, below of=lhsgath, xshift=25mm, yshift=4mm](r4){|r4|};
  \node[nd, below of=lrepl, xshift=7mm, yshift=-1mm](r6){|r6|};
  \node[nd, below of=lrepl, yshift=5mm](r5){|r5|};

  \node[nd, below of=lmini, xshift=-18mm](l2){|invoke l 2|};
  \node[nd, right of=rmini, yshift=-4mm, xshift=8mm](r2){|invoke r 2|};
  \draw[dep2](l2) to (lmini);
  \draw[dep2](lrepl) to (l2);
  \draw[dep2](r2) to (rmini);
  \draw[dep2](rrepl) to (r2);
  \draw[dep2](l2) to (l);
  \draw[dep2](r2) to (r);

  \draw[dep](r3) to (lgath);
  \draw[dep](r3) to (rgath);
  \draw[dep](lhsgath) to (r3);
  \draw[dep](r4) to (lrepl);
  \draw[dep](r4) to (rrepl);
  \draw[dep](lhsrepl) to (r4);
  \draw[dep](r5) to (rgath);
  \draw[dep](lmini) to (r5);
  \draw[dep](r6) to (lgath);
  \draw[dep](rmini) to (r6);
  \draw[dep](rgath) to (r);
  \draw[dep](rrepl) to (r);
  \draw[dep](lgath) to (l);
  \draw[dep](lrepl) to (l);
  \draw[dep](lrepl) to (lmini);
  \draw[dep](rrepl) to (rmini);
\end{tikzpicture}
\label{warren.fig.vdpds.g4}
}

\caption{VPDGs of the visits graph example.}
\label{warren.fig.vdpds}
\end{figure}

\paragraph{Invariants.}
The representation of visits graph leads to a recursive set of constraints (presented below)
for which we can incrementally construct \emph{a} visits graph. For a given attribute grammar,
there may be many possible visits graphs. The way we represent the graph gives rise to visit sequences
that are relatively independent and only compute what is necessary, but which are not
guaranteed to be as large as possible. Different visits graphs may exhibit small differences
in performance, although in principle any visits graph suffices.
We impose a number of constraints on the visits graph for which it is possible to find
a solution when the PDGs are acyclic, and for which it is possible to fine a \emph{unique} solution.

In the formalization below, we denote a relation as partial function without a right-hand side,
or as (partial) boolean functions.
Moreover, when used in a boolean expression, we consider such relations a total function from the
arguments to a Boolean result value,
which is |True| if and only if the arguments form an element of the relation.
When the expression of a guard has the value |undefined|, then the guard itself has the value |False|.
Also, for unbound variables in pattern expressions we assume existential quantification, unless
indicated explicitly otherwise.

\begin{figure}[tb]
\begin{code}
avail (many a) g d  ^^               ^^  = ^^ all (avail (many a) g) (deps g d) && except (many a) g d

except (many a)  g <% asyn.x.y    %> ^^  | ^^ <% visit x i %>          `elem` deps g <% asyn.x.y %>
                                         | ^^ <% child x (many a') %>  `elem` deps g <% asyn.x.y %> && <% asyn.x.y %> `elem` (many a')
except (many a)  g <% ainh.x.y    %>     | ^^ True
except (many a)  g <% asyn.lhs.y  %>     | ^^ True
except (many a)  g <% ainh.lhs.y  %>     | ^^ ainh.y `elem` (many a)
except (many a)  g <% rule x      %>     | ^^ True
except (many a)  g <% child x     %>     | ^^ True
except (many a)  g <% visit x i   %>     | ^^ True
\end{code}
\caption{The definition of |avail|.}
\label{fig.warren.avail}
\end{figure}

\begin{defi}[Available]
A vertex |d| is \emph{available} in a VPDG |g|
when vertex |d| can be scheduled, and respects the order imposed by |g|.
Formally, the relation |avail (many a) g d| as defined in Figure~\ref{fig.warren.avail} (explained below)
states whether a vertex |d| in a VPDG |g| is available when the node is in configuration |many a|.
\end{defi}

The relation |avail| states that a vertex |d| is available when all its dependencies are available, with the exception that
only the inherited attributes |ainh.lhs.y| are available if they are part of the configuration |many a|,
and that synthesized attributes of visits are only available if there was a visit that computed
them. Thus, |avail| gives us a notion of which vertices are scheduled in a VPDG in a given configuration.

\begin{defi}[Well-formed]
A visits-graph |g| is \emph{well-formed} when it satisfies the invariants below.
Additionally, the visits-graph is acyclic, and so are the VPDGs that are stored in
the configurations.
Also, in each edge of |g| or each vertex of |g| with some nonterminal |N|,
there is exactly an entry |<% prop P ... %>| for each production |P| of |N|.
We omit some straightforward structural invariants, such as that each visit-edge is annotated with the same
nonterminal |N| as the nonterminal |N| of the two configurations it connects.
\end{defi}

\begin{figure}[tb]
\begin{code}
<% conf N (many a0) (many p0) `sep` visit i : N (many a) (many c) `sep` conf N (many a1) (many p1) %> `elem` edges (sub VG N) =
  all3 (transition (many a1) (many a)) (many p0) (many c) (many p1) ^^ && ^^ many a1 == many a0 \+/ many a

transition (many a1) (many a) (prod P (many k0) g0) (prod P (many r)) (prod P (many k1) g1) =
  abstract (embed (many a1) g0 (many r)) == g1 ^^ && ^^ complete (many a1) (many a) g1 && all2 (trchild (many r)) (many k0) (many k1)

trchild (many r) <% child x : N (many a) %> <% child x : N (many a') %> ^^ = ^^ (many a') == foldl (trsim x) (many a) (many r)

trsim x (many a) <% sim (many m) %>
  |  x `notelem` children (many m)                                                                                             ^^  =  ^^  emptyset
  |  <% invoke x i %> `elem1` (many m) ^^ && ^^ <% s `sep` visit i : N (many a') (many c) `sep` s' %> `elem` edges (sub VG N)  ^^  =  ^^  (many a) \+/ (many a')

children (many m)                  = ^^ { x | <% invoke x i %> `elem` (many m) }
complete (many a1) (many a) g1 ^^  = ^^ all (avail (many a1) g1) { k.lhs.x | k.x `elem` (many a) }
\end{code}
\caption{Main invariant: are all visits represented.}
\label{fig.warren.correctlyrepresented}
\end{figure}
The main invariant is that all visits that are correctly represented in the graph, which is
captured by the property in Figure~\ref{fig.warren.correctlyrepresented}.
A visit-edge of nonterminal |N| requires a transformation of the VPDG for each production of |N|.
The visit must precisely mention the new attributes |many a| that are added to |a0| to form |a1|.
The sequence of simultaneous
invocations |many r| describes these transitions for the children. When applied to the VPDG |g0|
of the old configuration, |embed (many a1) g0 (many r)| gives the unabstracted VPDG |g1| of the new configuration.
Moreover, |g1| must be complete, which means that all the synthesized attributes |asyn.lhs.y| in the
new configuration must be available in |g1|.
Finally, the invocations |many r| also describe the state transitions of the children. The
relation |trchild (many r) k0 k1| relates the old configuration |k0| of a child to the
new configuration |k1|.

\begin{figure}[tb]
\begin{code}
req (many a) g d  ^^ = ^^ d `elem` reqs (many a) g
reqs (many a) g   ^^ = ^^ { d | ainh.x `elem` (many a), asyn.y `elem` (many a), (ainh.lhs.x <-* d <-* asyn.lhs.y) `elem` g  }
\end{code}
\begin{code}
embed (many a) g     <% many r %>        ^^ = ^^ foldl (embed (many a))    g (many r)
embed (many a) g     <% sim (many m) %>  ^^ = ^^ foldl (embed (many a) g)  g (many r)
embed (many a) g0 g  <% invoke x i %>
  | all (avail g0 (many a)) inhs && all (req g0 (many a)) syns
  = semb (visit i (many a')) (vert1(x)) \+/ semb (visit i (many a')) (edge1(x)) \+/ g
  where
    <% s `sep` visit i : N (many a') (many c) `sep` s' %> `elem` edges (sub VG N)
    inhs  = { ainh.x.y  | ainh.y  `elem` (many a') }
    syns  = { asyn.x.y  | asyn.y  `elem` (many a') }
\end{code}
\caption{The relations |req| and |embed|.}
\label{fig.warren.reqandembed}
\end{figure}

The relation |req (many a) g d| in Figure~\ref{fig.warren.reqandembed}
specifies if |d| is required for the synthesized
attributes of |many a| to be \emph{available} in |g|.
Note that the relation |d1 <-* d2| represents a possibly empty path between |d1| and |d2|.

The function |embed| in Figure~\ref{fig.warren.reqandembed} applies the invocations of
visits |many r| to the VPDG |g|. A visit may only
be invoked if its synthesized attributes are required, since our strategy schedules only
the attributes that are needed.
Moreover, the inherited attributes must be available.

The simultaneous invokes represent invocations that can be applied in any order. Since we test
which inherited attributes are available according to |g0|, the order in which these visits
are applied does not affect the PDGs. This is because the synthesized attributes
of the child are not available in |g0|, and can thus not be used to make more inherited attributes
of another child available. This property allows us to determine visits graphs in a stable way.
Some modifications are possible to this approach to schedule attributes less eagerly.

Further, for each nonterminal |N|, there must be a vertex in the visits graph with an empty
configuration. An empty configuration consists of an empty configuration for the children of
each production.
In addition, for the root symbols of the grammar with a non-empty set of attributes,
the configuration with all attributes defined must be part of the visits graph. The configurations
of the children do not have to be fully defined. Also, there must be an appropriate edge between
these configurations.

\paragraph{Properties.}
The invariants imposed on the visits graph ensure a number of properties for which we sketch
proofs.

\begin{proof}[The visits graph is acyclic]
The destination configuration connected by an edge is larger
than the edge's source configuration, because the set of attributes of an edge is not empty.
Edges thus connect distinct configurations. The configurations connected by edges
form an ascending chain, and can thus not be cyclic.
\end{proof}

\begin{proof}[The VPDGs are acyclic] The VPDGs of the initial configurations are by definition acyclic.
The transformation induced by a visit adds visit-vertices, although these keep the VPDG acyclic.
A vertex that is available requires its (indirect) dependencies to be available. Thus, an
available vertex cannot depend on an unavailable vertex.
A visit-vertex connects available vertices to unavailable vertices of synthesized attributes
of a child. To form a cycle, such an available vertex must depend (indirectly) on one of
the synthesized attributes. Since such a vertex is unavailable, a cycle is not possible.
\end{proof}

\begin{proof}[Consistent visits to children] The edges of the NDG of a child forms the edges between
the child's inherited and synthesized attributes in the VPDG of a production that contains
the child. Suppose that we visit child |x| with |many a| as the attributes of the visit.
As induction hypothesis, we assume that the available inherited and synthesized
attributes of |x| are part of a consistent visit sequence.
To prove that the visit sequence with the visit added is consistent, the definition requires
us to consider four cases. In each case, one of the attributes is an element of |many a|.

As the first case, given an attribute |ainh.x.y| and an attribute |asyn.x.z|, suppose that
|ainh.y <-+ asyn.z| in the NDG of |x|, and |asyn.x.z `elem` many a|.
Consequently, vertex |asyn.x.z| is unavailable.
Then |ainh.x.y <- asyn.x.z| in the VPDG, which means that
|ainh.x.y| is available. Hence, |asyn.x.z| must be in the same or later visit as |ainh.x.y|.
Proofs for the remaining three cases are similar.
\end{proof}

\begin{proof}[Consistent states] When two potential edges in the visits graph converge, they result
in the same state.
Clearly, when two edges converge, the edges have an equivalent set of
attributes |many a1| as destination state, otherwise the edges would
not converge. Then, for both a VPDG |g1| of one edge, and a VPDG |g2| for the same production
of the other edge, the |reqs| sets for |many a1| are equivalent. The attributes of a child's
state are exactly those in the |reqs| set, hence the states of the children are equivalent.
\end{proof}

\begin{proof}[Dependencies between attributes of the same kind] A synthesized attribute |asyn.x.y|
can depend on a synthesized attribute |asyn.x.z| and still be computed as part of the same visit when
the inherited attributes both |asyn.x.y| and |asyn.x.z| depend on are available. Similarly,
two inherited attributes can both be passed simultaneously to a child when they are both
available. An inherited attribute that depends on a synthesized attribute of the same child,
however, cannot be scheduled to the same visit.
\end{proof}

\paragraph{Construction.}
The size of the visits graph is in both the worst and average case exponential in the number
of attributes. However, we normally need only a small portion of this graph. In the remainder
of this chapter, we call the visits graph of a program a slice of the graph that we inferred
from the program.

To incrementally construct this slice, we distinguish \emph{partial} and {final} vertices and
edges.
A partial vertex contains the configuration |many a|, but not the administration for
the productions. A final vertex does contain this information. Similarly, a partial vertex
contains only the visit identifier |i| and the attributes |many a|.

The partial vertices and edges represent information of the graph that needs to be in the
graph, but what we did not compute yet. The final vertices and edges represent already computed
parts of the graph. For example, the initial vertices with an empty state are final vertices.
For the root symbols, we insert a partial edge and partial destination vertex to the graph.
As algorithm, we repeatedly take a partial edge with a final source vertex, and perform scheduling
to turn it into a final edge, and consequently the destination vertex into a final vertex.
Scheduling may result in new edges and vertices being added to the graph.

The algorithm that performs scheduling for final edges computes the information for the
edge as a function of the source state and the attributes |many a| of the edge. This ensures
that the order in which we consider pending edges does not affect the resulting visits graph.
Moreover, the visits graph is finite, thus if the computation for each pending edge is
finite, then so is the whole computation.

As a given, the input to the algorithm is a set of attributes |many a| of the edge, and the VPDGs of the
productions of the previous configuration. To compute the remaining administration of
the edge, we infer the visits to the children. The |embed| relation tells us how.
We first determine which vertices are in the |reqs| set. Then we repeatedly which of
the attributes of the children are required and ready to be scheduled. Those form
one group of simultaneous visits. For each visit, we possibly add a pending edge and a
pending vertex to the visits graph, if such vertices are not yet in the graph.
This process terminates because the VPDG is acyclic. With each iteration, there is either
at least one vertex that can be scheduled, or we are done.

The compilation of the largest AG of UUAG takes less than a minute using a slightly optimized version
of the algorithm as sketched above. To improve the performance,
the construction of the graph is relatively straightforward to parallelize. The processing of each edge
is independent and can be done in parallel. Most of the shared state is read-only.
Only updates on the state need to be properly
synchronized, but these updates happen relatively infrequent, and there is likely little
contention. The construction of the graph is thus likely to scale very well.
Also, the process may be done incrementally. When a change in the AG does not affect the PDGs of
a nonterminal, the graphs constructed for that nonterminal so far may be reused as initial
visits graph.
However, the implementation is performant enough to be used in practice, even without such
optimizations.

\paragraph{Remarks.}
Both the construction of the PDGs and the visits graph is a whole-program analysis of the sources
related to an evaluation algorithm of a compiler for a particular tree
(e.g. one stage in the compiler pipeline).
The requirement that the
sources related to one algorithm must be analyzed as a whole is usually not a problem in large
compiler implementations, because individual algorithms are usually monolithic.
When plugins are concerned, such plugins are typically a separate stage, and would thus be
compiled independently.
However, it is in theory possible to defer the computations of the visits graph to
load-time, although then the composition of rules must also be determined
at load-time, using meta programming facilities such as Template Haskell.

There are various customizations posible in the construction on the visits graph which may
have severe consequences for the structure of the graph. The structure of the graph does not
influence the outcome of the attribute evaluation, but may affect execution time.

At the moment of writing, we are still gathering empirical data regarding performance.
In earlier measurements on UHC, the difference between on-demand (lazy) and eager
(strict) attribute evaluation turned out to be insignificant. This may be due to the
use of strictness annotations and DeepSeq in combination with the sequentialization of computations
due to unifications on a chained substitution attribute. However, in earlier measurements with
the AG implementation of the editor Proxima~\citep{Schrage04proxima}, execution time almost
halved when eager AG evaluation was used.
Note that with the rise of multi-core computing, the effects of
visit sequences on parallel behavior may still be significant~\citep{DBLP:conf/waga/KuiperS90,629079,springerlink:10.1007/BF02948394}.


\section{Optimizations}
\label{warren.sect.optimizations}

%% Decision: this chapter should provide some relief after the formal parts
%%  and for the formal parts to come.

The shape of the graph may have an impact on the performance of the generated code.
In our approach, visits are likely to be small and independent, which is beneficial
for parallelism and incremental evaluation.
However, when the visits graph is huge, code size becomes a more pressing issue.
The visits graph of the largest AG in the UHC project features about 10,000
configurations. Hence, we require measures to limit the size of this graph.

\paragraph{Subsumption.}
When the graph is huge, there are many contexts that are similar, yet differ in one or more
attributes because such an attribute was not needed or only needed later. Such a small difference
can easily cause many visits to be needed in the graph.

An edge |i| from a configuration |many a| with attributes |many (ainh.y)| and |many (asyn.z)|
\emph{subsumes} an edge |k| from |many a| with attributes |many (ainh.p)| and |many (asyn.q)|
when |many (asyn.y)`subseteq` many (ainh.p)| and |many (asyn.q) `subseteq` many (asyn.q)|.
When we are about to declare a visit, but it is subsumed by an already declared visit, we may use
that visit instead.
This approach can potentially save many contexts, at the expense of producing some results that are not
needed yet. That is usually not a problem, because non-trivial computations of some node
normally have dedicated inherited attributes, thus if one of these inherited attributes is needed
then so is the non-trivial computation that computes the synthesized attribute.

A downside of the subsumption approach is that the order in which edges in the visits graph are
considered may influence the outcome. The effects of such approaches requires further investigation.

We experimented with the strategy to determine the inherited attributes of a visit
based on the synthesized attributes that are required, but determining the largest set of
synthesized attributes that can be computed from the inherited attributes available so
far. This strategy reduced the graph of UHC's largest AG to 1,500 nodes, at the
slight expensive of returning one or two attributes more than is strictly necessary.
Under the assumption that complex synthesized attributes are always dependent on their
own set of inherited attributes, the additional cost is negligible.

\paragraph{Partial Kastens.}
Two other decisive factors are the number of attributes and
productions, and sparse dependencies between attribues. The number of productions is typically
fixed, but the number of attributes and the dependencies can be influenced.
We can produce an algorithm for any consistent visit sequence. Consequently, a PDGs can be replaced
with supergraphs as long as these remain acyclic.

In our experience, orderable AGs are absolutely non-circular (Section~\rref{sect.intro.ag.eval}).
If for one production there is a child |x| with nonterminal |N| that has
a dependency of an attribute |ainh.x.y| on an
attribute |asyn.x.z|, then this dependency can also be imposed for all other children with
nonterminal |N|. For such an AG, the approach of \citet{DBLP:journals/acta/Kastens80} is
applicable.

The approach of Kastens determines a total order on the attributes in the NDGs using a
late-as-possible strategy. As mentioned earlier, this approach likely causes the graphs to become cyclic.
By removing the edges between vertices of strongly connected components that
are not in the original PDG, we obtain a non-cyclic supergraph of the original PDG.
This supergraph has Kastens' algorithm partially applied, and is likely to have a significant
lower number of possible contexts.

For large AGs, this appears to be the most effective step to limit the state explosion
caused by many productions and attributes. It does not require manual intervention.
On the other hand, it restricts the freedom in choosing smaller and independent visits.

\paragraph{Attribute elimination.}
In combination with copy rules, it is common practice to define attributes on nonterminals where
they are actually unused, or only used in chains of copy rules. During the development of an AG,
such attributes also show up because the AG is not finished, thus not all attributes are in use.

Superfluous attributes seem innocent, but actually make the scheduling harder.
These attributes and their rules for such
nonterminals are largely independent, thus easily lead to many
contexts in the visits graph, because many ways to interleave them are allowed.
Further, the Kastens' algorithm typically schedules them too early, which causes cycles in the
dependency graphs, thus makes the above approach to reduce contexts less effective.

A combination of dead-code elimination and copy propagation~\citep{Nielson:1999:PPA:555142}
can be used to eliminate superfluous attributes of a nonterminal.
As an additional benefit, their removal may improve the performance of the application, because
it prevents the trivial copying of attributes around the tree.

Similarly to the dependency analysis itself, many analyses for AGs can be specified as a recursive
set of constraints. The common pattern is that some property of a nonterminal is determined by
combining the properties of each production, which we call abstraction.
This property of the nonterminal is then instantiated
for each a child
of the nonterminal. A fixpoint can then be computed starting with a fixed value for the
nonterminals that are roots, and a bottom value for the other nonterminals.

In a similar way as shown in Section~\rref{warren.sect.dependencies}, we can define that an
attribute is \emph{live} if it is a dependency of a live attribute. For the root nonterminal, all
synthesized attributes are live. Since this analysis is defined in terms of the attribute dependencies,
the analysis can straightforwardly be defined in terms of the dependency graphs.
As with dead-code elimination, the attributes and rules that are not live are removed from the grammar.

Similarly, a collection attribute~\citep{10.1109/SCAM.2007.13} is \emph{empty} if it is composed
with either a copy rule or a monoid's append from empty attributes. In this case, we start with
non-empty as bottom value, and do not treat start symbols of the grammar in a special way.
Instead, a use rule that depends on no attributes becomes empty as value eventually.
%% In Section~\rref{sect.exploit.cfg} we show how such fixpoint computations on DAGs can be described
%% with attribute grammars.

Finally, the output attribute of a copy rule is a \emph{copy} of an attribute |x|, where
|x| either equals |z| when the input attribute |y| is a copy of another attribute |z|, or
|x| equals |y| otherwise. This also applies to rules for which we can syntactically
determine that its body is essentially the identity function,
For other rules, the output attribute is not a copy of another attribute.
Initially, each attribute is not a copy of any attribute.
set of attributes. During abstraction, a synthesized attribute may be a copy of an inherited
attribute if all productions agree on that attribute.

Given the results of the copy and empty analyses, a rule that depends on an attribute |x| that is a copy of |y|
can substitute |y| for |x|. A rule that depends on an empty attribute can substitute the attribute
with the empty value, and thus remove the dependency on the attribute. Finally, dead-code
elimination cleans up the unused copies, unused empty attributes, and unused other attribues.

The updatable attributes provide some more options for extensions. When an attribute is
\emph{constant} or the attribute is \emph{unique}~\citep{DBLP:conf/icfp/HageHM07}, these attributes
can be stored as a mutable structure in a global state, and replaced by a single attribute that
stores a reference to that structure.
These are analyses that are relatively straightforward to implement and exploit for BarrierAG,
in contrast to general purpose programming languages, which have complications due to
data types and higher-order functions.

\begin{figure}[p]
\begin{code}
S  ::=  (sub sem (^^ Gamma)) P : N (many c) (many q) j     -- execution plans |many q| of production |P|, and initial config |j|
c  ::=  conf j (many a)                                    -- description of a configuration |j|
q  ::=  s `sep` s' `sep` v                                 -- edge between configurations |s| and |s'|
v  ::=  visit i : N (many a) (many r)                      -- visit combined with rules of the visit

r  ::=  child x : N = f (many zup)                         -- (higher order) child declaration
   |    x : ^^ p (many zdown) = f (many zup)               -- evaluation rule
   |    sim (many m)                                       -- simultaneous invocations
m  ::=  invoke x i : N                                     -- child invocation

s  ::=  conf j (many k)                                    -- configuration of a node
k  ::=  child x : N j                                      -- configuration of child |x|

j                                                          -- configuration identifier
\end{code}
\begin{code}
sem { f1 = field_l, f2 = field_r, f3=min, f4=Bin } Bin : Tree 1
  { conf 1 emptyset, conf 2 {asyn.gath}, conf 3 {ainh.mini, asyn.gath, asyn.repl} }

  [  conf 1  { child l : Tree 1,  child r : Tree 1 }
  ;  conf 2  { child l : Tree 2,  child r : Tree 2 }
  ;  visit 1 : Tree  ^^^  child l : Tree    = f1
                          child r : Tree    = f2
                          sim { invoke l 1 : Tree, invoke r 1 : Tree}
                          r3: ^^^  id ^^ occval (asyn.lhs.gath)  ^^ = ^^ f3 ^^ occval (asyn.l.gath) ^^ occval (asyn.r.gath)

  ,  conf 2 { child l : Tree 2,  child r : Tree 2}
  ;  conf 3 { child l : Tree 3,  child r : Tree 3}
  ;  visit 2 : Tree  ^^^  r5: ^^^  id ^^ occval (ainh.l.mini)    ^^ = ^^ id ^^ occval (asyn.r.gath)
                          r6: ^^^  id ^^ occval (ainh.r.mini)    ^^ = ^^ id ^^ occval (asyn.l.gath)
                          sim { invoke l 2 : Tree, invoke r 2 : Tree }
                          r4: ^^^  id ^^ occval (asyn.lhs.repl)  ^^ = ^^ f4 ^^ occval (asyn.l.repl) ^^ occval (asyn.r.repl)

  ,  conf 1 { child l : Tree 1,  child r : Tree 1 }
  ;  conf 3 { child l : Tree 3,  child r : Tree 3 }
  ;  visit 3 : Tree  ^^^  child l : Tree    = f1
                          child r : Tree    = f2
                          sim { invoke l 1 : Tree, invoke r 1 : Tree}
                          r3: ^^^  id ^^ occval (asyn.lhs.gath)  ^^ = ^^ f3 ^^ occval (asyn.l.gath) ^^ occval (asyn.r.gath)
                          r5: ^^^  id ^^ occval (ainh.l.mini)    ^^ = ^^ id ^^ occval (asyn.r.gath)
                          r6: ^^^  id ^^ occval (ainh.r.mini)    ^^ = ^^ id ^^ occval (asyn.l.gath)
                          sim { invoke l 2 : Tree, invoke r 2 : Tree }
                          r4: ^^^  id ^^ occval (asyn.lhs.repl)  ^^ = ^^ f4 ^^ occval (asyn.l.repl) ^^ occval (asyn.r.repl)  ]
\end{code}
\caption{Syntax of execution plans, and the plans of production |Bin|.}
\label{warren.fig.execplans}
\end{figure}

\section{Execution Plans and Generated Code}
\label{warren.sect.executionplan}

With each vertex in the visits graph, we uniquely associate an identifier |j|.
For each edge of the visits graph, and for each of its productions, we construct an
execution plan. The vertices of the VDPG that are required for the new state, but not
required for the old state, are the edges that belong in the plan.
In Chapter~\rref{chapt.side-effect}, we described how to derive a total order for the
vertices. In this section, we simply assume that we take a topological sort of the graph.

\paragraph{Plan representation.}
Figure~\ref{warren.fig.execplans} shows the syntax of execution plans, and the
execution plans for the production |Bin|. We organize
the plans per production. For each production, it contains an execution plan for each
edge in the visits graph. The rules are ordered, and visits to children are made
explicit.

\begin{defi}[Intra-visit dependency]
With each vertex in the visits graph, we associate a set of
\emph{intra-visit} dependencies. These are values (denoted as descriptors |d|) in the state of a
node that are potentially needed in later visits. Given a visits graph |V|, the set of descriptors
contained in |s| for a production |P| is |intra P s| in Figure~\ref{warren.fig.intra} (explained below).
\end{defi}

The set of descriptors is determined by taking the descriptors introduced by by a visit, and
those needed by rules of later visits. Also note that the graph |g| is the graph of the new state for |defs|, and
the graph of the old state for |uses|. The difference is in the state of the child-nodes.
Finally, the set of intra-visit dependencies of the initial state and each final state is empty.

\begin{figure}[t]
\begin{code}
intra P s = \/{ (uses P s s' \/ intra P s') - defs P s s' ^^ | ^^ (s `sep` s' `sep` v) `elem` V }

uses P <% conf N (many a0) (many p0) %> <% conf N (many a1) (many p1) %> ^^ | ^^ <% prod P (many k) g %> `elem` (many p0)  =
  \/(map (uses P) (reqs (many a1) g - reqs (many a0) g))

uses P <% k.x.y %>                                                     = emptyset
uses P <% child x (many a) %>
  | abs (many a) == 0                                                  = \/{ map uses (many z) | <% child x : N = f (many zup) %> `elem` rules P }
  | length (many a) > 0                                                = { child x (many a) }
uses P <% rule x %>                                                    = \/{ (many z2) \/ map uses (many (sup z1 trdwn)) | <% x : ^^ p (many (sup z1 trdwn)) = f (many (sup z2 trup)) %> `elem` rules P }

uses <% sup z trref %>                                                 = { z }
uses <% sup z trval %>                                                 = emptyset
uses <% sup z trnew %>                                                 = emptyset

defs P <% conf N (many a0) (many p0) %> <% conf N (many a1) (many p1) %> ^^ | ^^ <% prod P (many k) g %> `elem` (many p1)  =
  \/(map (defs P) (reqs (many a1) g - reqs (many a0) g))

defs P <% k.x.y %>                                                     = emptyset
defs P <% child x (many a) %>                                          = { child x (many a) }
defs P <% rule x %>                                                    = \/{ map defs (many (sup z1 trdwn)) ^^ | ^^ <% x : ^^ p (many (sup z1 trdwn)) = f (many (sup z2 trup)) %> `elem` rules P }

defs <% sup z trref %>                                                 = emptyset
defs <% sup z trval %>                                                 = { z }
defs <% sup z trnew %>                                                 = { z }
\end{code}
\caption{Intra-visit dependencies of nodes.}
\label{warren.fig.intra}
\end{figure}

From execution plans, we generate visit functions (see also Section~\tref{sect.ag.eval.ordered}).
The process for individual visit functions
is largely conventional (Section~\tref{intro.outline.sideeffect}, and Section~\tref{iter.sect.semantics}).
However, the weaving of the visit functions is more complex, because we encode the visits graph, which
is in general not a linear sequence of visits.

\begin{figure}[t]
\begin{code}
type T_Tree = T_Tree_s1  -- initial configuration of the tree

  -- type of tree in a given state |s| (function from key to visit)
data T_Tree_s1  where  C_Tree_s1  ::  { inv_Tree_s1  :: forall t . K_Tree_s1  t ->  t }  ->  T_Tree_s1
data T_Tree_s2  where  C_Tree_s2  ::  { inv_Tree_s2  :: forall t . K_Tree_s2  t ->  t }  ->  T_Tree_s2
data T_Tree_s3  where  C_Tree_s3  ::  { inv_Tree_s3  :: forall t . K_Tree_s3  t ->  t }  ->  T_Tree_s3

  -- type of a \emph{key}, which identifies a visit |v| from state |s|
data K_Tree_s1 t where
  K_Tree_v1 :: K_Tree_s1 T_Tree_v1
  K_Tree_v3 :: K_Tree_s1 T_Tree_v3

data K_Tree_s2 t where
  K_Tree_v2 :: K_Tree_s2 T_Tree_v2

data K_Tree_s3 t where  -- empty data declaration

  -- type of a visit |v|, with continuation as the new state |s|
type T_Tree_v1  =         IO (Int,         T_Tree_s2)
type T_Tree_v2  = Int ->  IO (Tree,        T_Tree_s3)
type T_Tree_v3  = Int ->  IO (Int,  Tree,  T_Tree_s3)
\end{code}
\caption{Types of keys and semantic functions.}
\label{warren.fig.types}
\end{figure}

The visits graph specifies the possible states of an attributed tree, and models the visits that
can be done on it. Moreover, it specifies which attributes are in which state, and which
attributes are an input or output of a visit. In the generated code, an attributed tree is a
value that represents a vertex in the visits graph. A visit function represents an edge.
Figure~\ref{warren.fig.types} shows their types for the example. Below, we explain these types.

\paragraph{Types.}
For each vertex |j| of a nonterminal |N| in the visits graph, we generate a type |T_N_(sub s j)|.
This is the type for an attributed tree with nonterminal |N| and in configuration |j|.
For each edge |i| of a nonterminal |N| in the visits graph, we generate a type |T_N_(sub v i)|.
This is the type for a visit function that takes the tree from its source configuration
|T_N_(sub s (semb (i) (source1)))| to its target destination |T_N_(sub s (semb (i) (target1)))|.

\begin{figure}[p]
%format st1
%format st2
%format st3
%format k1
%format k2
%format k3
\begin{smallcode}
\begin{code}
sem_Bin :: T_Tree -> T_Tree -> T_Tree
sem_Bin field_l field_r = st1 where
  st1        =  let     k1 :: K_Tree_s1 t -> t
                        k1 K_Tree_v1  = v1
                        k1 K_Tree_v3  = v3
                        k1 _          = error "unreachable"

                        v1 :: T_Tree_v1
                        v1            = do  l1 <- return f1
                                            r1 <- return f2
                                            (l_gath, l2)  <- inv_Tree_s1 l1 K_Tree_v1
                                            (r_gath, r2)  <- inv_Tree_s1 r1 K_Tree_v1
                                            lhs_gath      <- return . id $ f3 l_gath r_gath
                                            return (lhs_gath, st2 l2 r2 l_gath r_gath)

                        v3 :: T_Tree_v3
                        v3 lhs_mini   = do  l1             <- return f1
                                            r1             <- return f2
                                            (l_gath, l2)   <- inv_Tree_s1 l1 K_Tree_v1
                                            (r_gath, r2)   <- inv_Tree_s1 r1 K_Tree_v1
                                            lhs_gath       <- return . id $ f3 l_gath r_gath
                                            l_mini         <- return . id $ id r_gath
                                            r_mini         <- return . id $ id l_gath
                                            (l_repl, l3)   <- inv_Tree_s2 l2 K_Tree_v2 l_mini
                                            (r_repl, r3)   <- inv_Tree_s2 r2 K_Tree_v2 r_mini
                                            lhs_repl       <- return . id $ f4 l_repl r_repl
                                            return (lhs_gath, lhs_repl, st3)
                in      C_Tree_s1 k1

  st2 l2 r2 l_gath r_gath
             =  let     k2 :: K_Tree_s2 t -> t
                        k2 K_Tree_v2  = v2
                        k2 _          = error "unreachable"

                        v2 :: T_Tree_v2
                        v2 lhs_mini   =  do  l_mini         <- return . id $ id r_gath
                                             r_mini         <- return . id $ id l_gath
                                             (l_repl, l3)   <- inv_Tree_s2 l2 K_Tree_v2 l_mini
                                             (r_repl, r3)   <- inv_Tree_s2 r2 K_Tree_v2 r_mini
                                             lhs_repl       <- return . id $ f4 l_repl r_repl
                                             return (lhs_repl, st3)
                in      C_Tree_s2 k2

  st3        =  let     k3 :: K_Tree_s3 t -> t
                        k3 _          = error "unreachable"
                in      C_Tree_s3 k3

  f1  = field_l   ^^^ ; ^^^ f2  = field_r ^^^ ; ^^^ f3  = min ^^^ ; ^^^ f4  = Bin
\end{code}
\end{smallcode}
\caption{Generated code of the production |Bin|.}
\label{warren.fig.code}
\end{figure}

We can apply one operation on an attributed tree. Given an \emph{typed key} |K_N_(sub s j) t| and
a tree |T_N_(sub s j)|, the function |inv_N_(sub s j) :: T_N_(sub s j) -> K_N_(sub s j) t -> t|
provides us with the visit function of type |t|:
\begin{code}
data T_N_(sub s j) where C_N_(sub s j) :: { inv_N_(sub s j) :: forall t . K_N_(sub s j) t -> t } -> T_N_(sub s j)
\end{code}
The constructor |C_N_(sub s j)| is essentially a wrapper around the |inv| function.
%% When we construct the attributed tree, we thus specify for each state

The type |t| can be chosen by a parent by providing a key with this type.
The key |K_N_(sub s j) t| is a type index.
The child can inspect the key to discover which type is
actually represented by |t|:
\begin{code}
data K_N_(sub s j) t where
  K_N_(sub v i1)         :: K_N_s ^^ T_N_(sub v i1)
  ...
  K_N_(sub v (sub i n))  :: K_N_s ^^ T_N_(sub v (sub i n))
\end{code}
For each outgoing edge |i| of |j|, |K_N_(sub s j)| contains a key |K_N_(sub v i)|
that serves as evidence that |t| equals |T_N_(sub v i)|.
The type |K_N_(sub s j)| has no constructors when |j| has no outgoing edges in
the visits graph. Indeed, a tree in such a configuration cannot be visited.

The type of a visit function |T_N_(sub v i)| is a function of values of the visit's inherited
attributes to a computation of a tuple of values of the visit's synthesized attributes, and
the new state of the tree. Since BarrierAG includes updatable attributes, the computation takes
place in the IO monad:
\begin{code}
type T_N_(sub v i) = sub tau ainh1 -> ... -> sub tau (sub ainh n)  -> IO (sub tau asyn1, ..., sub tau (sub asyn m), T_N_(sub s (semb i target1)))
\end{code}
The type |sub tau (sub ainh k)| is the type declared for the inherited attribute |sub ainh k| of nonterminal |N|,
and |sub tau (sub asyn k)| the type for the synthesized attribute |sub asyn k|.

\paragraph{Translation of semantics-blocks.}
Figure~\ref{warren.fig.code} gives the generated code for a production\footnote{
The full code of the example can be downloaded from:
\url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/ExampleWarren.hs}. It is compilable with GHC version 6.12.3.
}.
A sem-block is translated to a \emph{node constructor} function |sub st j| for each state |j|, which given values
for the intra-dependencies of |j|, returns a node with this state, thus of type |T_N_(sub s j)|.
The node constructor |st1| constructs the initial state. Therefore it has an empty set of intra-dependencies
and is thus represents the initial value of the node.
In contrast, the node constructor |st2| takes the states of the live children as parameter, and values of the
live attributes.

The body of a node constructor is a wrapper around a function |sub k j|, which returns the visit
function |sub v i| given the key |K_N_(sub v i)| that identifies one of the outgoing edges of |j|.
For each visit |i| that is a successor of configuration |j|, we generate a visit function |sub v i|.
The visit function |sub v i| takes values for the inherited attributes as parameter that
are declared on edge |i|. It returns a computation that gives a tuple of the synthesized
attributes that are declared on edge |i|. In addition , it returns the result state of the
node by applying the constructor for the next state to the values of that state's intra dependencies.

\begin{figure}[t]
\begin{code}
semb (child x : N = f (many zup)) gen1            ^^^  =  ^^^  semb x (gen1(initial N)) <- (sub lift (|many zup|)) f (semb (many zup) trup)
semb (x : ^^ p (many zdown) = f (many zup)) gen1  ^^^  =  ^^^  (many y) <- (sub lift (|many zup|)) f (semb (many zup) trup)
                                                               (many (semb zdown (trdwn y)))   ^^^ ^^^ where (many y) fresh
semb (invoke x i : N) gen1                             =       let vis = sub ainvoke (sub s (semb i source1)) (semb x (source1(i))) (K_N_(sub v i))
                                                               (semb x (syn1(i)), (semb x (target1 i))) <- vis (semb x (inh1(i)))

semb x (inh1(i))                                       =       { semb (ainh.x.y) gen1 ^^ | ^^ ainh.y <- (sub a (sub N i)) }
semb x (syn1(i))                                       =       { semb (asyn.x.y) gen1 ^^ | ^^ asyn.y <- (sub a (sub N i)) }

semb (sup z trval) (trdwn y)                           =       let semb z gen1 = y
semb (sup z trref) (trdwn y)                           =       writeIORef (semb z gen1) y
semb (sup z trnew) (trdwn y)                           =       (semb z gen1) <- newIORef y

semb (sup z trval) trup                                =       return (semb z gen1)
semb (sup z trref) trup                                =       readIORef (semb z gen1)

semb (h.c.x) gen1                                      =       h_c_x
semb (x) (gen1(j))                                     =       x_j
\end{code}
\caption{Translation scheme for rules in the execution plan.}
\label{warren.fig.rules}
\end{figure}

Figure~\ref{warren.fig.rules} shows a straightforward translation of the rules in the execution plan.
We assume that barrier attributes and dependency rules are stripped from the execution plan and the
administration after the visits graph has been constructed.

For each visit to a child we pass as additional parameter the appropriate key to the invoke-function
of the child's state, which results in the visit function |vis|. The function |vis| subsequently takes the
inherited attributes as parameters.

For an attribute occurrence |z| at an input position, either the transcribed identifier |semb z gen1|
is |z|'s value, or is a reference that can be read from to provide |z|'s value.
In a similar way, an attribute occurrence |z| at an output position is either stored as the
transcribed identifier |semb z gen1|, or written to the reference under that name.

\paragraph{Remarks.}
A nice property of the translation is that we explicitly declare the types of the visit functions
with type signatures. Since the types of the functions |f| are monomorphic, type errors are
usually to be reported in functions |f|, which are defined in the actual source code, so that
type errors can be related back to the original locations in the source file.
We do not need to know the types of local attributes. Also, when we allow type variables in the
types, then these can be supported in our scheme using scoped type variables.

The translation can be optimized in various ways. When a configuration does not have outgoing edges,
a visit to a child that ends in this configuration does not need to construct nor return the
new state of the child, as the child cannot be visited anymore, and the state is actually empty
according to the definition of |intra|.
When a configuration has one outgoing edge,
then the selection of a visit via a key is not needed. Also, when the visits graph for a
nonterminal is a tree, a single key that identifies the path in the tree can be given to the
child when it is created, instead of one segment of the path for each visit.

Instead of relying on Haskell to construct closures, the node constructors can use an
untyped, updatable array instead. With conventional techniques from register scheduling, the
state can be represented such that needless copying of states is avoided during visits, which
makes visits cheap. In this thesis, we do not venture down this path, and rely on Haskell to
handle closures. A compiler that employes uniqueness and usage analysis~\citep{DBLP:conf/ifl/VriesPA07,DBLP:conf/icfp/HageHM07}
can apply this optimization transparently.

When a simultaneous invocation group contains more than one visit, we can use |forkIO| to allow the
Haskell runtime to evaluate the visits in parallel, since these do not have common
dependencies.

\section{Generalization to Phases}
\label{sect.warren.phases}

We use BarrierAG as a host language to describe phases.
A nonterminal declares a set of phases. Attributes may be associated uniquely to a phase.
A phase corresponds to one or more implicit visits.
Since a visit describes the smallest unit of evaluation of a node in the tree,
a phase describes a larger unit of evaluation. It allows us to express properties of chunks of
AG evaluation, without resorting to the low level details of visits.

The following example is a possible declaration of phases for nonterminal |Tree|. The
indentation determines which attribute is declared in which phase.
The scope of the phase declaration ends before a
keyword at the same indentation level:
\begin{code}
itf Tree                          -- declaration of attributes and phases of |Tree|
  syn gath    :: Int              -- attribute not assigned to a phase
  phase distribute                -- declaration of a phase |distribute|
    inh mini  :: Int              -- attribute |mini| in phase |distribute|
  phase transform                 -- declaration of a phase |transform|
    syn repl  :: Tree             -- attribute |repl| in phase |transform|
\end{code}
These phase declarations are unordered, and can be specified in any order.

\begin{figure}[tb]
\begin{code}
I  ::=  itf N (many q)                  -- a set of declaration of a phase-interface
q  ::=  a                               -- toplevel attribute decl (not associated with a phase)
   |    phase ph (many q)               -- phase with attribute declarations

d  ::=  ...                             -- descriptors extended with phases
   |    begin x ph                      -- begin of a phase |ph| of child |x|
   |    end x ph                        -- end of a phase |ph| of child |x|

s  ::=  sem N prod P (many r) (many t)  -- common rules |r| and a set of phase blocks |many t|
t  ::=  phase ph (many r) (many t)      -- common rules |r| and subsequent phases |many t|

r  ::=  ...                             -- AG rules
   |    invoke ph of (many c) z         -- specifies invocation of phase |ph| of |many c| with strategy |z|
\end{code}
\caption{Notation for phase interfaces.}
\label{fig.warren.phaseitfs}
\end{figure}

Figure~\ref{fig.warren.phaseitfs} shows the notation for phase interfaces. There is a high similarity with
the notation for visit interfaces of Chapter~\rref{chapt.side-effect}. The main difference is that we may
specify multiple blocks of subsequent phases in a phase-block.
The lexical nesting of phase declarations (on a nonterminal) and phase blocks (in a sem-block)
impose a partial order on phases.
Also order-rules in combination with |begin lhs ph| and |end lhs ph|
imposes a partial order.
A phase-block of phase |ph2| that is nested in a phase block |ph1| is evaluated
either during the evaluation of |ph1| when there is a constraint |begin lhs ph2 `pre` end lhs ph1|,
or after the evaluation of |ph| otherwise.

Invoke-rules may be explicitly given or be implicit. An invoke-rule |r| with a phase |ph|
corresponds to one or more actual visits to a child |x|. The rule itself precedes the first
of these visits, thus |r `pre` begin x ph|.

The lexical scope of a phase block, which is relevant for local attributes and default-rules,
is only determined by the nesting of phase blocks in a production. Possibly more constraints
on the order induced by other productions or order-rules are not taken into account for scoping.
This technical detail ensures that we can determine the vertices of the PDGs before the
dependency analysis takes place.

We essentially have two main evaluation algorithms to choose from: demand-driven evaluation and
statically ordered evaluation.
We may specify several properties of a phase, such as cyclic or acyclic, and pure or impure.
Not all combinations are possible. For example, a cyclic phase must use on-demand evaluation,
and may not be impure.
Also, the properties impose constraints on rules in such a phase, or on rules that invoke such a phase.
For example, an invoke-rule in a pure phase may not invoke an impure phase on a child.
The scheduling of rules takes such constraints into account (Section~\tref{visit.sect.order}).
In a host language with lazy evaluation, the on-demand algorithm is a
simplification of the eager algorithm, hence in this thesis we focus only on the latter.

\begin{figure}[tb]
\begin{code}
attr N  ^^^  inh  (sub abegin N)   barrier                         -- begin master phase for |N|
        ^^^  syn  (sub aend N)     barrier                         -- end master phase for |N|

attr N  ^^^  inh  (sub abegin ph)  barrier                         -- begin barrier for each phase |ph| of |N|
        ^^^  syn  (sub aend ph)    barrier                         -- end barrier for each phase |ph| of |N|

attr N  ^^^  ainh.(sub abegin ph)   `pre`  asyn.(sub aend ph)      -- begin before end for each phase |ph|
        ^^^  ainh.(sub abegin N)    `pre`  ainh.(sub abegin ph)    -- begin |ph| after begin master phase
        ^^^  asyn.(sub aend ph)     `pre`  asyn.(sub aend N)       -- end |ph| before end master phase

attr N  ^^^  ainh.(sub abegin ph)   `pre`  k.y                     -- for each attribute |k.y| of phase |ph|
        ^^^  k.z                    `pre`  asyn.(sub aend ph)      -- for each attribute |k.y| of phase |ph|
        ^^^  ainh.(sub abegin ph)   `pre`  ainh.(sub abegin ph')   -- for each nested phase |ph'|
        ^^^  asyn.(sub aend ph')    `pre`  asyn.(sub aend ph)      -- for each nested phase |ph'|
        ^^^  ainh.(sub abegin N)    `pre`  k.y                     -- for each attribute |k.y| not in a phase
        ^^^  k.z                    `pre`  asyn.(sub aend N)       -- for each attribute |k.y| not in a phase
\end{code}
\caption{Sketch of a translation of phases to BarrierAG.}
\label{fig.warren.phasessketch}
\end{figure}

\paragraph{Foundation.}
We express phases in terms of BarrierAG, which we sketch in Figure~\ref{fig.warren.phasessketch}.
A dependency rule |k.x `pre` k.y| on attr-blocks of |N| is syntactic sugar for rule |k.lhs.x `pre` k.lhs.y|
in each production of |N|.
For each phase |ph| of nonterminal |N|, we introduce two barrier-attributes
|sub abegin ph| and |sub aend ph|. The attributes of the phase are enclosed by these barriers. There is one
master phase |N| for each nonterminal |N| where all attributes and phases are enclosed by via dependencies on
its barriers.

The lexical nesting of phases and rules induces additional order constraints, as sketched
by the following example:
\begin{code}
sem N (many r1)         --  |ainh.lhs.(sub abegin N) `pre` (many r1)|
  phase ph1             --  |ainh.lhs.(sub abegin N) `pre` ainh.lhs.(sub abegin ph1)|
    (many r2)           --  |ainh.lhs.(sub abegin ph1) `pre` (many r1)|
    phase ph2           --  |ainh.lhs.(sub abegin ph1) `pre` ainh.lhs.(sub abegin ph2)|
      (many r3)         --  |ainh.lhs.(sub abegin ph2) `pre` (many r3)|
\end{code}
With order-dependencies, and with syntax as demonstrated in Chapter~\rref{chapt.side-effect},
more dependencies between phases may be specified.

Further, we take the union of all constraints on phases of nonterminal |N| from each child with
nonterminal |N|, and integrate these in the NDG of |N|. The PDGs must remain acyclic, otherwise
the constraints on phases are inconsistent, which is considered a static error.

\begin{figure}[tb]
\begin{center}
\begin{tikzpicture}
  [ phase/.style={circle, minimum size=1mm, node distance=4mm, top color=black, bottom color=black,font=\scriptsize}
  , lab/.style={font=\footnotesize}
  , level distance=6mm
  ]

\node[phase](master){}
  child {
    node[phase](analyze){}
      child {
        node[phase](name){}
      }
      child {
        node[phase](type){}
      }
  }
  child {
    node[phase](translate){}
  };

\node[lab, right of=master]{master};
\node[lab, right of=translate]{translate};
\node[lab, left of=analyze]{analyze};
\node[lab, left of=name]{name};
\node[lab, right of=type]{tpcheck};
\end{tikzpicture}
\end{center}
\caption{Phase nesting visualized as a tree.}
\label{fig.warren.phasenesting}
\end{figure}

\paragraph{Phase nesting.}
A phase may be contained inside another phase. The parent of a node can invoke the
contained phase of the node as part of the evaluation of the containing phase.
This model introduces levels of granularity that allow for more concise specifications.
We can visualize this model as a tree, which we show in Figure~\ref{fig.warren.phasenesting}.

\begin{defi}[Nesting tree]
A \emph{nesting tree} describes the nesting of phase declarations.
\end{defi}

In a nesting tree, the nodes are phases. When a node |ph2| is a child of a node |ph1|,
|ph1| is a nested phase of |ph2|. The above exemplary tree corresponds to the following
phase interface:
\begin{code}
itf Expr             -- defines the siblings for the root (master phase)
  phase analyze      -- nested phase in master phase
    phase name       -- nested phase in phase |name|
    phase tpcheck    -- nested phase in phase |tpcheck|
  phase translate    -- nested phase in master phase
\end{code}
The above phase interface describes that during the evaluation of the analyze phase of a child,
the name phase of that child is invoked. Different strategies may be specified for the
name phase than for the analyze phase.

The nesting tree can be inferred from the NDG.
Given a nonterminal |N|, |N|'s phase |ph1| is a child of |N|'s phase |ph2|,
if either:
\begin{itemize}
\item |ainh.(sub abegin ph2) <-+ ainh.(sub abegin ph1)| and |asyn.(sub aend ph1) <-+ asyn.(sub aend ph2)|.
      In this case, |ph1| is fully enclosed by |ph2|.
\item there exists a |ph3| so that |ph3| is a child of |ph1|, and |ph3| is a child of |ph2|, with in the NDG
     |ainh.(sub abegin ph2) <-+ ainh.(sub abegin ph1)| or |asyn.(sub aend ph1) <-+ asyn.(sub aend ph2)|.
     In this case, |ph3| is both a child of |ph1| and |ph2|, which we resolve by making |ph1| a child
     of |ph2|.
\end{itemize}
The constraints form a directed graph per nonterminal, and can be solved with a fixpoint computation.
If the resulting graph is not
a tree, then either the constraints are inconsistent, or too few constraints were specified. Such a
nesting tree leads to additional edges between the barriers of the NDG. If the PDGs become
cyclic due to these additional edges, the constraints on phases were inconsistent.
This approach infers a single nesting tree per nonterminal.

The inference of the nesting tree has the advantage that it becomes easier to compose phases.
On the other hand, the nesting of phases is typically limited, and has a \emph{purpose}, so it
is only a slight burden to specify the nesting fully.

\paragraph{Overlap prevention.}
Sibling phases may not overlap, otherwise it is unclear which evaluation of a node corresponds to
which phase.
Two phases overlap if either
|ainh.x.(sub abegin ph2) <-+ ainh.x.(sub abegin ph1)| and |asyn.x.(sub aend ph2) <-+ asyn.x.(sub aend ph1)|,
or the other way around.  However, the dependencies on barriers do not
guarantee this property.
Given two sibling phases |ph1| and |ph2|, we wish to express that either
|end ph1 `pre` begin ph2| or |end ph2 `pre` begin ph1|. Since siblings are not ordered, we do not
know which of the two to take.

Attribute scheduling is actually a reduction of the ordering of phases. If we define phases
such that each attribute is declared in a unique phase, then determining the order of phases
is attribute scheduling. The order of phases may thus be dependent on context.
Therefore, we take a similar approach as Section~\ref{sect.warren.visitsgraphs} and
define the phases graph.

\paragraph{phases graph.}
The phases graph of a nonterminal is the phases graph of the master phase of the nonterminal.
A phases graph is a DAG where each represents a phase, and an edge from |a| to |b| means that
|a| comes before |b| in the sequence. A vertex is labelled with the phases graph of its
children:
\begin{code}
g  ::=  phase ph (many g) (many e)   -- phases graph of phase |ph| (initially the master phase)
e  ::=  g -> g                       -- edges connect phases subgraphs
\end{code}
Such a graph |g| may have several sources and sinks. However, each path from source to sink
must contain precisely all sibling phases, because each path corresponds to an ordered
sequence of siblings in the nesting tree of the nonterminal.

Similar as the visits graph, the phases graph contains all consistent ways to interleave phases.
An interleaving is consistent when it satisfies the partial order of the phases.
In practice, there are only a few phases per nonterminal, with relatively dense
constraints, thus the actual required portion of the graph contains little variety.
Also, the problem is slightly easier compared to Section~\rref{sect.warren.visitsgraphs}
because each path from source to sink is equally long, and the number of vertices on such
a path is known beforehand.

We take a slightly different representation of the phases graph to stress the similarity with
the visits graph. The vertices |s| of the phases graph represent an ordered sequence of the
available |inh.(sub abegin ph)| attributes. An edge is annotated with the attribute that has
become available:
\begin{code}
s  ::=  (many a)                    -- state of a vertex in the phases graph
e  ::=  s (over (->) a) s           -- edge in the phases graph
\end{code}
Note that we can construct the nesting tree from such a sequence.

The purpose of the phases graph is to determine a small number of totally ordered
nesting trees for a nonterminal. In a similar way as with visits, we determine
a total order on the phases of children of a production given the total order on the phases
of the production. For this analysis, we do not add visit-vertices to the PDGs. Instead, we add dependencies between
attributes |asyn.x.(sub aend ph1)| and |ainh.x.(sub abegin ph2)| for children |x| and phases
|ph1| and |ph2|.

\begin{figure}
\begin{code}
avail (many a) g d  ^^               ^^  = ^^ all (avail (many a) g) (deps g d) && except (many a) g d

except (many a)  g <% ainh.lhs.y  %>     | ^^ not isBegin ainh.lhs.y  || ainh.y `elem` (many a)
except (many a)  g <% ainh.x.y    %>     | ^^ not isBegin ainh.x.y    || (d `elem` deps g <% ainh.x.y %> && isEnd d)
except (many a)  g <% asyn.c.y    %>     | ^^ True
except (many a)  g <% rule x      %>     | ^^ True
except (many a)  g <% child x     %>     | ^^ True
except (many a)  g <% visit x i   %>     | ^^ True

isBegin  <% k.x.(sub abegin ph) %>
isEnd    <% k.x.(sub aend ph) %>
\end{code}
\caption{The relation |avail| with begin and end barriers.}
\label{fig.warren.availbeginend}
\end{figure}

Recall that a vertex in a PDG is available when all its dependencies are available.
In this situation, for a begin barrier to be ready, it must have been connected to an end barrier.
Figure~\ref{fig.warren.availbeginend} gives the definition.
Further, an end-barrier is required when the begin-barriers of the phase and its children are available.
Sufficient child-edges must be added so that the (indirect) dependencies of a required end-barrier
are available.

\begin{defi}[Selectable]
A begin-barrier is \emph{selectable} when the following conditions are met.
\begin{itemize}
\item Its dependencies are available. This implies that the begin-barrier of its parent is available.
\item If a begin-barrier of a sibling is available, then so is the end-barrier of that sibling.
      This ensures that the phases do not overlap.
\end{itemize}
\end{defi}
Options for child-edges can thus be determined by taking the intersection between
selectable and required begin-barriers. If this intersection is empty, then the dependencies on
phases are inconsistent, i.e. forcing the barriers to overlap. If there are multiple options available,
we prioritize based on some stable order, such as the order of appearance.

As initial solution, we start with only the begin-barrier of the master phase available for each
nonterminal. For the root nonterminals, we construct as initial solution a full path by determining
a last-as-possible order given the dependencies of the NDG, and prioritizing based on the stable
order mentioned earlier.

\paragraph{Attribute scheduling.}
The set of paths from sink to source is in practice small in the phases graph. The amount of
paths is the price to pay for not specifying a total order on phases.
For each path in the phases graph, we determine the accompanying nesting tree. The set of these
trees form the phase interfaces of a nonterminal. For each child, we specify which phase interface
to use, using a type index for the child-rule.

With each phase interface corresponds a specialized version of the NDG and PDGs. We also construct
a separate visits graph per phase interface. Due to the additional dependencies imposed by the
phases-interface, variability in the visits graph is reduced.

Moreover, on each nesting-tree, we can perform additional attribute scheduling. The attributes
are restricted by a partial order, and this order may be strengthened if it does not lead to
cycles in the NDG of the nesting tree.
For example, we may schedule attributes to the
latest possible phase, or avoid scheduling attributes to certain phases. Such heuristics are
similar to those discussed for rule scheduling in Section~\tref{visit.sect.order}.

\paragraph{Remarks.}
The generalization to phases ensures that our approach
is a conservative extension of ordered attribute grammars.
A conventional attribute grammar can be expressed by associating all attributes
with one phase.

The inference of the phases graph is similar to the inference of the visits graph.
It may be possible to combine the inference of both graphs. However, it is not immediately
clear how to express heuristics, such as the avoidance of certain phases, in a combined approach.


\section{Commuting Rules}
\label{warren.sect.commutting}

In this section, we present AGs with commuting rules, which are chained rules that can
be reordered. These commuting rules provide an abstraction for the use of
references in Section~\ref{sect.warren.core}.
Given an explicit ordering of rules, the composition of two rules is \emph{commutative} when the
two rules are \emph{commutable}, which means that the rules may be swapped in the composition.
Rule composition is a conditionally \emph{commutative operator}. Commutable rules represent
\emph{commutable operations}.
The swapping of rules models side effects, and commutativity facilitates reasoning about
the safe use of side effects.

\paragraph{Syntax.}
We assume some conventional AG to start with, and show later how to encode the commutable
rules in BarrierAG.
Firstly, we introduce \emph{threaded attributes}, which are a special form of chained
attributes. Secondly, we introduce commuting rules, which specify a set |many h| of
commutable chains the rule participates in:
\begin{code}
k  ::=  ...                                                -- attribute forms (e.g. |inh| and |syn|)
   |    thr                                                -- threaded attribute

r  ::=  ...                                                -- rules
   |    x : ^^ p (many z1) = f (many z2) ^^^ (many h)      -- rule |x| that commutes over |many h|

h  ::=  z1 `comm` z2                                       -- rule commutes with rules of |z1| and |z2|
\end{code}
The syntax |z1 `comm` z2| specifies that the rule connects chains of |z1| and |z2|.
Attribute |z1| must be on an input position, and |z2| on an output position. Also, both must
be threaded attributes, and their types must be the same.

\begin{figure}[tb]
\begin{code}
attr Tree  ^^^  thr unq :: Int               -- declaration of threaded attribute

sem Root
  | Root  ^^^  root.unq   = 1                -- initial value of threaded attribute
               loc.final  = root.unq         -- final value of threaded attribute

sem Tree                                     -- modifications to the threaded attribute
  | Leaf  ^^^  (loc.myId, lhs.unq)  = (lhs.unq, lhs.unq+1)  ^^^  lhs.unq  `comm` lhs.unq
  | Bin   ^^^  l.unq                = lhs.unq               ^^^  lhs.unq  `comm` l.unq
               r.unq                = l.unq                 ^^^  l.unq    `comm` r.unq
               lhs.unq              = r.unq                 ^^^  r.unq    `comm` lhs.unq
\end{code}
\caption{Example of commuting rules.}
\label{fig.warren.commutingexample}
\end{figure}

The example in Figure~\ref{fig.warren.commutingexample} uses a threaded counter to demonstrate the commuting rules.
Each value of |loc.myId| is unique, although it is not guaranteed that a right-sibling
of a node has a higher |loc.myId| value. This would be the case if the rules were not
commutable.

The example shows as the first rule of |Root| how to provide the initial value of a threaded attribute. This is a
rule that defines a threaded attribute, but does not commute over it. Also, the example shows as
the second rule of |Root|
how to obtain the final value of a threaded attribute, which is a rule that refers to the
value of a threaded attribute, but does not commute over it.
Finally, the rules of |Tree| are commutable rules.

As additional requirement, a rule that defines a threaded attribute |lhs.y| must mention
|lhs.y| in its set of commuting chains |many h|. This restriction ensures that the
threaded attribute can be represented as an inherited attribute.

The commuting chains |many h| specifies in which chains a rule participates:
\begin{code}
uses (many h)  =  { z1 |  <% z1 `comm` z2 %> `elem` (many h) }
defs (many h)  =  { z2 |  <% z1 `comm` z2 %> `elem` (many h) }
\end{code}
A rule with |many h| connects |uses (many h)| to |defs (many h)|.

\paragraph{BarrierAG encoding.}
We use a BarrierAG encoding as a means to specify the implementation of commuting rules.
We represent each threaded attribute |thr x :: tau| with three attributes in BarrierAG:
\begin{code}
inh  (sub x ref1)  :: IORef tau      ^^^  -- reference to the mutable state
syn  (sub x wr1)   barrier                -- all commutable updates before this barrier
inh  (sub x rd1)   barrier                -- all non-commutable updates after this barrier
\end{code}
Thus, a threaded attribute is a reference to a mutable state. We ensure in the encoding
that rules only depend on the reference, which permits the reordering.
As invariant, the write barrier of the child depends on all commuting rules of the
child (and its subtree) that update the reference. All non-commuting rules that refer
to the reference depend on the read barrier.

\begin{figure}[tb]
\begin{code}
sem N | P                                                                         -- |k| is a child of production |P| of nonterminal |N|
  loc.(ann k (x) (ref1))    :: IORef tau                                          -- local attribute declarations with type
  loc.(ann k (x) (wr1))     barrier                                               -- local barrier declaration
  loc.(ann k (x) (rd1))     barrier                                               -- local barrier declaration

  (sup (k.(sub x ref1)) trval)  =      (sup (loc.(ann k (x) (ref1))) trval)  ^^^  -- copy as reference for child
  k.(sub x wr1)                 `pre`  loc.(ann k (x) (wr1))                      -- restrain read barrier of child
  (ann k (x) (rd1))             `pre`  k.(sub x rd1)                              -- restrain write barrier of child
\end{code}
\caption{Sketch of the encoding of commuting rules in BarrierAG.}
\label{fig.warren.commsketch}
\end{figure}

For each threaded attribute |thr y :: tau| of a child |k|, we introduce three local
attributes, and rules to connect the local attributes with the attributes of the child.
Figure~\ref{fig.warren.commsketch} gives a sketch.
Note that |ann k x ref1| is the name of the attribute. The name of the child |k|
is part of the name of the attribute.

When a rule refers to the threaded attribute of |k|, we actually let it refer to the local
attributes, as we show below.
For notational convenience, we also introduce these local attributes for threaded attributes
of |lhs|:
\begin{code}
  (sup (loc.(ann lhs (x) (ref1))) trval)  =        (sup (lhs.(sub x ref1)) trval)      ^^^  -- copy as reference
  loc.(ann lhs (x) (wr1))                 `pre`    lhs.(sub x wr1)                          -- restrain read barrier
  k.(sub lhs rd1)                         `pre`    (ann lhs (x) (rd1))                      -- restrain write barrier
\end{code}
These rules for |lhs| are the contravariant version of the rules for children.

\begin{figure}[t]
\centering
\begin{tikzpicture}
  [ nd/.style={rectangle, draw, minimum size=6mm, node distance=10mm,font=\scriptsize, sibling distance=16mm}
  , nd2/.style={ellipse, draw, minimum size=8mm, node distance=10mm,font=\scriptsize, sibling distance=16mm}
  , dep/.style={->,thick}
  , dep2/.style={->,dashed, thick}
  , dep3/.style={->,dotted, thick}
  , lab/.style={font=\scriptsize}
  ]

  %% Root
  \node[nd](lhsunqref){|lhs.(sub unq ref1)|};
  \node[nd,right of=lhsunqref, xshift=6mm](lhsunqrd){|lhs.(sub unq rd1)|};
  \node[nd,right of=lhsunqrd, xshift=6mm](lhsunqwr){|lhs.(sub unq wr1)|};

  \node[nd,below of=lhsunqref](loclhsunqref){|loc.(ann lhs unq ref1)|};
  \node[nd,right of=loclhsunqref, xshift=6mm](loclhsunqrd){|loc.(ann lhs unq rd1)|};
  \node[nd,right of=loclhsunqrd, xshift=6mm](loclhsunqwr){|loc.(ann lhs unq wr1)|};

  \draw[dep](loclhsunqref) to (lhsunqref);
  \draw[dep3](loclhsunqrd) to (lhsunqrd);
  \draw[dep2](lhsunqwr) to (loclhsunqwr);

  %% Left
  \node[nd,below of=lhsunqref, xshift=-36mm, yshift=-40mm](loclunqref){|loc.(ann l unq ref1)|};
  \node[nd,right of=loclunqref, xshift=6mm](loclunqrd){|loc.(ann l unq rd1)|};
  \node[nd,right of=loclunqrd, xshift=6mm](loclunqwr){|loc.(ann l unq wr1)|};

  \node[nd, below of=loclunqref](lunqref){|l.(sub unq ref1)|};
  \node[nd,right of=lunqref, xshift=6mm](lunqrd){|l.(sub unq rd1)|};
  \node[nd,right of=lunqrd, xshift=6mm](lunqwr){|l.(sub unq wr1)|};

  \draw[dep] (lunqref) to (loclunqref);
  \draw[dep3] (lunqrd) to (loclunqrd);
  \draw[dep2] (loclunqwr) to (lunqwr);

  %% Right
  \node[nd,below of=lhsunqref, xshift=36mm, yshift=-40mm](locrunqref){|loc.(ann r unq ref1)|};
  \node[nd,right of=locrunqref, xshift=6mm](locrunqrd){|loc.(ann r unq rd1)|};
  \node[nd,right of=locrunqrd, xshift=6mm](locrunqwr){|loc.(ann r unq wr1)|};

  \node[nd, below of=locrunqref](runqref){|r.(sub unq ref1)|};
  \node[nd,right of=runqref, xshift=6mm](runqrd){|r.(sub unq rd1)|};
  \node[nd,right of=runqrd, xshift=6mm](runqwr){|r.(sub unq wr1)|};

  \draw[dep] (runqref) to (locrunqref);
  \draw[dep3] (runqrd) to (locrunqrd);
  \draw[dep2] (locrunqwr) to (runqwr);

  %% Rules
  \node[nd2, below of=lhsunqref, xshift=-15mm, yshift=-20mm](r1){|r1|};
  \node[nd2, right of=r1, xshift=20mm](r2){|r2|};
  \node[nd2, right of=r2, xshift=20mm](r3){|r3|};
  \node[nd2, left of=r1](r1'){|ann r 1 trref|};
  \node[nd2, below of=r2, yshift=-5mm](r2'){|ann r 2 trref|};

  %% Dependencies of reference
  \draw[dep] (r1) to (loclhsunqref);
  \draw[dep] (r1) to (loclunqref);
  \draw[dep] (r1') to (loclhsunqref);
  \draw[dep] (loclunqref) to (r1');
  \draw[dep] (r2) to (loclunqref);
  \draw[dep] (r2) to (locrunqref);
  \draw[dep] (r2') to [in=30,out=170] (loclunqref);
  \draw[dep] (locrunqref) to (r2');
  \draw[dep] (r3) to (loclhsunqref);
  \draw[dep] (r3) to (locrunqref);

  %% Dependencies of write
  \draw[dep2] (loclhsunqwr) to (r3);
  \draw[dep2] (locrunqwr) to (r3);
  \draw[dep2] (locrunqwr) to (r2);
  \draw[dep2] (loclunqwr) to (r2);
  \draw[dep2] (loclhsunqwr) to (r1);
  \draw[dep2] (loclunqwr) to (r1);
  \draw[dep2] (loclunqwr) to [out=35,in=145] (locrunqwr);
  \draw[dep2] (loclhsunqwr) to [out=-145,in=80] (loclunqwr);

  %% Dependencies of read
  \draw[dep3] (loclunqrd) to (loclhsunqrd);
  \draw[dep3] (locrunqrd) to [out=155,in=25] (loclunqrd);

  %% Legend
  \node[lab, right of=loclhsunqwr, xshift=20mm](a0){};
  \node[lab, right of=a0](a1){};
  \node[lab, below of=a0, yshift=4mm](b0){};
  \node[lab, right of=b0](b1){};
  \node[lab, below of=b0, yshift=4mm](c0){};
  \node[lab, right of=c0](c1){};
  \draw[dep](a1)  to node[lab,auto,above]{|ref1|} (a0);
  \draw[dep2](b1) to node[lab,auto,above]{|wr1|} (b0);
  \draw[dep3](c1) to node[lab,auto,above]{|rd1|} (c0);

\end{tikzpicture}
\caption{Attributes encoding the commutative rules of the production |Bin|.}
\label{fig.warren.commexmpl}
\end{figure}

\emph{Non-commuting write}.
When a rule |x| with commuting chains |many h| defines a threaded attribute |athr.k.y|,
but |athr.k.y `notelem` defs (many h)|, rule |x| serves as \emph{initializer} for the threaded
attribute.
In the encoding, the defining occurrence |athr.k.y| in the left-hand side of rule |x|
is replaced with |sup (loc.(ann k y ref1)) trnew|. Moreover, we add the following dependency rule:
\begin{code}
loc.(ann k y wr1)  `pre`  loc.(ann k y rd1)   -- orders writes before reads
\end{code}
Since |k| is the start of the chain, and given the invariants on the read and write barriers,
this dependency rule ensures that commutable writes take place before non-commutable reads.

\emph{Non-commuting read}.
When a rule |x| with commuting chains |many h| refers to a threaded attribute |athr.c.y|,
but |athr.c.y `notelem` uses (many h)|, rule |x| reads the final value of the threaded attribute.
Thus, we replace the occurrence |athr.c.y| in the right-hand side of rule |x| with
|sup (ann c y ref1) trref|, and add the following dependency:
\begin{code}
loc.(ann c y rd1) `pre` rule x  ^^ -- the read depends on the read barrier
\end{code}
The identifier |c| is either a child |k| or |lhs|.

\emph{Commuting read and write}.
When a rule |x| with commuting chains |many h| refers to a threaded attribute |athr.c.y|, and
|athr.c.y `elem` defs (many h)|, we replace the defining occurrence in the left-hand side of |x|
with |sup (ann c y ref1) trref|. When |athr.c.y `elem` uses (many h)|, we replace the
occurrence in the righthand side of |x| with |sup (ann c y ref1) trref|.
Moreover, we insert a rule |sup x trref| for each commuting chain, which copies the
reference and thus links the chain. Also, we connect the read and write barriers:
\begin{code}
sup x trref : id (sup (k.y2) trval) = id (sup (c.y1) trval)  ^^^   -- for each |c.y1 `comm` k.y2 `elem` many h| (copies the reference)
rule x              `pre`  loc.(ann c y wr1)   ^^^                 -- for each attribute |y| in |many h| (rule before write barrier)
loc.(ann c y2 wr1)  `pre`  loc.(ann k y1 wr1)                      -- for each |c.y1 `comm` k.y2 `elem` many h| (connect write barriers)
loc.(ann c y1 rd1)  `pre`  loc.(ann k y2 rd1)                      -- for each |c.y1 `comm` k.y2 `elem` many h| (connect read barriers)
\end{code}
When |lhs.y `elem` defs (many h)|, the read and write barrier are not connected, as it would
create a cycle.
Also, note the contravariant behavior between the read and write barrier.

The barrier attributes and dependency rules enforce the proper ordering of rules that use
threaded attributes. Figure~\ref{fig.warren.commexmpl} demonstrates the encoding for the
Bin-production. The boxes represent attributes. The circles represent the relevant rules.
The barriers and their dependencies are not part of the generated
code, and thus do not have a runtime overhead. As a consequence, we actually
transformed a chained attribute into an inherited attribute with a reference to a mutable state.

\paragraph{Referential transparency.}
Referencial transparency is important for equational reasoning. For AGs, it is also important
to ensure that the order of evaluation does not affect the result.
In BarrierAG, rules that use updatable attributes break referential transparency.
However, with commutable rules, we can establish a weaker version of referential transparency.
When two rules commute, the actual values for the attributes that these rules define may be
different, but in the context where these rules are defined, the final result, which
abstracts from the values of the attributes, may still be equivalent to any ordering
of the commutable rules.

The composition of rules, and the context of rules can be made explicit with arrow notation.
A commuting rule |r1 : ^^ (x1,y1) = f (x0,y0), ^^ x0 `comm` x1| and a commuting rule
|r2 : ^^ (x2,z1) = g (x1,z0), ^^ x1 `comm` x2| correspond respectively to the arrows
|(x1,y1) <- f -< (x0,y0)| and |(x2,z1) <- g -< (x1,z0)|.
Section~\tref{intro.sect.correspondences} shows that the composition of these rules
can be expressed as an arrow:
\begin{code}
proc (x0,y0,z0) -> do
  (x1,y1) <- f -< (x0,y0)
  (x2,z1) <- g -< (x1,z0)
  returnA (x2,y1,z1)
\end{code}
Alternatively, when we reorder |f| and |g|, and rename the attributes, we obtain the following
arrow:
\begin{code}
proc (x0,y0,z0) -> do
  (x1,z1) <- g -< (x0,z0)
  (x2,y1) <- f -< (x1,y0)
  returnA (x2,y1,z1)
\end{code}
This notion can straightforwardly be generalized to rules that commute over many attributes,
or define and use many other attributes.

These two rules are \emph{commutable} over attributes of |x0,x1| and |x1,x2| if their
compositions are equivalent for a given rule context |h|, and |r1 `notpre` r2|:
\mathhs
  \begin{displaymath}
  |h| \left(
         \begin{code}
           proc (x0,y0,z0) -> do
             (x1,y1) <- f -< (x0,y0)
             (x2,z1) <- g -< (x1,z0)
             returnA (x2,y1,z1)
         \end{code} \right) |^^^ == ^^^| |h| \left(
            \begin{code}
              proc (x0,y0,z0) -> do
                (x1,z1) <- g -< (x0,z0)
                (x2,y1) <- f -< (x1,y0)
                returnA (x2,y1,z1)
            \end{code} \right)
  \end{displaymath}
\plainhs
If there exists directly or indirectly a dependency between |r1| and |r2| then
the rules may not commute. This is for example the case when |r1| defines a
(non-threaded) attribute that is used by |r2|, or because of a dependency rule.

The rule context |h| is an abstraction of the composition in which
the composition of
|f| and |g| is
contained. For example, |h| can represent the composition of the rules of the
entire tree, and thus the rule states that the end result of the
computation is not affected.
In practice, we take a more abstract notion of |h|. For example, the property
that all |loc.myId| attributes have a unique value.
In Section~\tref{summary.sect.commuting} we give some examples of
the function |h|.

\paragraph{Remarks.}
To reason with commutable rules, we may need to make assumptions about
the order of evaluation.
This is, for example, the case when the values of the attributes are trace monoids~\citep{Diekert:1997:PCT:267871.267879}.
Phases can be used for this purpose.

The identification of commuting rules may be relevant for the parallel and incremental
evaluation of attribute grammars. Chained attributes sequentialize code, whereas
commuting rules allow more interleaving.
Similarly, during incremental evaluation, changes in a subtree that appears earlier in
the evaluation may be lifted over a later subtree if these changes are visible in
a threaded attribute.

Commutable rules can also be used to collect statistics or other runtime properties about
the evaluation process, such that the attribute-dependencies of the collecting rules
have only a minor influence on the evaluation process.
For example, a count of the nodes of the tree traversed so far may be an indication of
how much work has been done.

In case of type inference, substitutions may be represented as a threaded attribute, so
that the threading of the substitution does not influence the order of evaluation.
Traditionally, unification is only a commutable operation when all unifications succeed.
By encoding the substitution as a graph structure, it is possible to make unification
a commutable operation also in the case of a type conflict~\citep{heeren05TopQuality}.


\section{Related Work}

The ability to compose attribute grammars is an important benefit that attribute grammars offer.
In fact, AGs are so easily composed that the compositions may accidentally
become inconsistent, i.e. have attributes with a cyclic definition.
\citet{DBLP:journals/mst/Knuth68} proves that an AG is well-defined if and only if the dependency graphs
of productions are cycle-free according to his refined algorithm (Knuth-2).
When the dependency graphs are cycle-free according to Knuth's original algorithm (Knuth-1)
then the AG is well-defined, but not necessarily the other way around.
These are static properties of AGs that provide guarantees that
the evaluation of an AG terminates.

Knuth-2 uses a dependency graph per production/child production combination, in contrast
to a single dependency graph per production as the Knuth-1 approach uses.
Knuth-2 leads to an approximate number of dependency graphs per production in the order
$(|p|^|b|)$, where |p| is the number of productions of the child nonterminals, and |b| is the
number of children. In practice, e.g. for the let-production of a lambda calculus,
|p| is rather large (|p > 10|), but |b| is typically small (|b <= 2|).
However, we usually define a fine granularity of nonterminals so that
distinct dependency graphs per production/child production combination does not offer an
advantage over a single graph per production.

In our experience, AGs are either necessarily cyclic, or are cycle-free in both
Knuth-1 and Knuth-2.
If the AG is not cycle-free with Knuth-1, then this is an indication that the AST does not
have sufficient structure, which is not likely in strongly typed languages.
In the first case, on-demand evaluation may still yield
results for attributes, although it is the responsibility of the programmer to ensure this.
In the second case, we know that an evaluation order exists, and can use a statically ordered evaluation
algorithm. A statically ordered evaluation algorithm is likely to exhibit better time and space behavior, and
actually permits minor assumptions about the evaluation order to be made.

\citet{DBLP:journals/acta/Kastens80} presented an approach to infer a visit interface
per nonterminal. Unfortunately, when using the Kastens approach, we often
encounter cycles that are induced by the scheduling as resulting from Kastens' scheduling algorithm.
These induced cycles hamper compositionality, because as remedy, we need to add artificial
dependencies between attributes to the AG to control the scheduling. Also, the effect of scheduling
is not visible in the original rules of the grammar, which makes such cycles very hard to
understand and resolve.

The approach by \citet{DBLP:conf/popl/KennedyW76} can find a solution, but may possibly
result in an exponentially large solution. In practice
the solution is not so large: In our experience, this approach works very well for small AGs.
For large AGs, however, the exponential behavior may show up. To counter this behavior,
we provide sufficient mechanisms to restrict the set of solutions.

\section{Conclusion}

This chapter demonstrates one of the great strengths of attribute grammars:
the ability to statically analyse the grammar with abstract interpretations.
As one of the main contributions of this
chapter, we reformulated the approach of \citet{DBLP:conf/popl/KennedyW76} so that it
can be used to generate code for a strongly typed, purely functional host language.
For absolutely non-circular AGs, this approach finds a way to order the rules statically.

By using such an approach, it is not needed to explicitly schedule attributes and rules, unless
these need to be given special properties. We introduce the notion of phases for this.
Since the ordering of attributes and rules can be inferred, we may omit such details from
our specification. Consequently, the specification becomes more concise and easier to
compose, which is beneficial from an engineering point of view.

The price that we pay is that such an analysis can only give an answer per
context, such as a context dependent phase or visit interface, and a nonterminal
may be in exponentially many contexts.
A programmer typically has some evaluation order in mind, thus is usually
able to specify a single order that works in all contexts, which may reduce the size of the
generated code, compile time, and the execution time.
On the other hand, if there is not a single order that works in all contexts, then a
programmer is not likely to be able to keep track of all the possibilities manually.
We presented phases as a means to specify knowledge about the order of evaluation, without
going into the fine details, as in Chapter~\rref{chapt.side-effect}.

Also, for some problems, the order imposed by attribute dependencies may be too restrictive.
We presented commuting rules as a means to loosen some restrictions. Commuting rules can be
used when a number of rules form a chain, and the individual ordering of the rules is not
relevant for the result. A typical example is a sequence of rules that provides unique numbers
to nodes in the tree. When the requirement is only that each number is unique, the actual order
in which numbers are handed out is does not invalidate that requirement.

As future work, more quantitative insight is needed about the effects of dependency
analysis on the performance of the generated code. In earlier experiments, the impact
seemed negligible, although the results differed from one AG to another.
Also, we need more insight which heuristics have impact
on the size of the visits graph. For example, it appeared that our approach in which we only
compute what is needed resulted in less paths in the visits graph than the approach where we
compute as much as possible. However, more quantitative evidence is required to draw such
conclusions.

Another direction of future work is an extension of the Kennedy-Warren approach with
support for cyclic AGs. Given a collection of cyclic PDGs, we can determine which attributes
of the NDGs are mutually dependent. By grouping these attributes in a separate phase that uses
lazy evaluation (for example) as evaluation algorithm, we can support a combination of a
cyclic and non-cyclic AGs.

%% Todo: Mention commutable rules