%format inh  = "\mathbf{inh}"
%format syn  = "\mathbf{syn}"
%format sem  = "\mathbf{sem}"
%format attr = "\mathbf{attr}"
%format when = "\mathbf{when}"
%format merge = "\mathbf{merge}"
%format as   = "\mathbf{as}"
%format child = "\mathbf{child}"
%format rec   = "\mathbf{rec}"
%format set   = "\mathbf{set}"
%format .    = "."
%format ##   = ||
%format alpha = "\alpha{}"
%format ^^^ = "\hspace{1em}"
%format <*> = "\star{}"
%format ~> = "\leadsto{}"
%format (sub (x) (y)) = "{" x "}_{" y "}"
%format (sup (x) (y)) = "{" x "}^{" y "}"
%format (many (x))    = "\overline{" x "}"
%format (semb (x))    = "{\llbracket{}" x "\rrbracket{}}"
%format choose_or     = choose "_{" or "}"
%format choose_app    = choose "_{" app "}"
%format emptyset = "\emptyset"

%format big1
%format big2
%format And1
%format And2
%format And3
%format Or1
%format Or2
%format Or3
%format f1
%format f2
%format m1
%format m2
%format p1
%format p2
%format p3
%format p4


%if fullVersion && showExtra
\newcommand\stepRefAppendix[2]{Appendix~\ref{#1}}
%else
\newcommand\stepRefAppendix[2]{Appendix~{#2}}
%endif


%if not thesis
  \begin{abstract}
%endif
  Attribute Grammars are a powerful formalism to specify and implement the
  semantics of programming languages (e.g. as in a compiler), in particular when
  the semantics are syntax directed.
  Advanced type systems, however, have declarative specifications that encode
  decisions that are independent of syntax.
  The implementation of such decisions is hard to express algorithmically
  using conventional attribute evaluation.

  \Thiswork{} presents Stepwise Attribute Grammars (SAGs). In a SAG,
  nondeterministic choices can be expressed in a natural way in conjunction with unambiguous
  resolution strategies based on attribute values. SAGs preserve the functional relationships between
  attributes and support on-demand evaluation.
  The exploration of alternatives are encoded as a choice between the semantic results
  of children. Evaluation of a child can be performed in a stepwise fashion: it is paused after
  each step and yields a progress report with intermediate results, until the
  child is reduced to its semantic value. This facilitates a breadth-first exploration
  of choices, until choices can be resolved based on the progress reports.
%if not thesis
  \end{abstract}

\category{D.3.4}{Processors}{Translator writing systems and compiler generators}
\terms{Algorithms, Languages}
%endif


\section{Introduction}
\label{step.sect.introduction}

  %% Paragraph 1: AGs are cool, even in practice
  %% Many of the benefits are actually the result of tool support.
  Attribute Grammars (AGs)~\citep{DBLP:journals/mst/Knuth68} are a formalism that
  is particularly
  suited for the concise implementations for semantics of programming languages
  (e.g. static, operational, denotational), in the form of a compiler or interpreter.
  Hereby, attributes play a crucial role: properties of Abstract Syntax Trees (ASTs)
  can easily be expressed as attributes, and combined to form more
  complex properties. These attributes can be shared and additional attributes can be added on demand.
  For example, attributes related to name analysis and type checking can be used in a later stage for
  code generation or the collection of error messages.
  We used AGs for many small, but also several large projects, including the Utrecht Haskell
  Compiler (UHC)~\citep{uhc}, the Helium compiler for learning Haskell, the Generic Haskell Compiler,
  and the editor Proxima. AGs, and corresponding tool support, proved to be essential for these
  projects.

  %% Paragraph 2: Dark clouds gather on the horizon: nondeterminism.
  %% I choose here to explain nondeterminism in specifications later, and first show why
  %% it causes trouble.
  %% Often not a problem. Solveable by unification. But in practice, there are some nasty problems.
  Modern programming languages allow a compiler to take some of the implementation effort
  away from the programmer. In C\#, local type inference is employed, such that an abundance of
  type signatures can be omitted. Many typed OO languages have an auto-boxing feature to automatically
  wrap primitive values into objects. Such features save programmers from tedious tasks.
  To specify this freedom, programming language specification are declarative, and impose
  sufficient restrictions such that a deterministic algorithm exists.

  Unfortunately, it is hard to deal with declarative type systems using conventional AGs.
  For example, it is not immediately obvious how to express Haskell's overloading for UHC using an
  AG. In fact, we currently rely on a constraint solver external to UHC's AGs.
  A lot of boilerplate code is required to interface with such a solver, and it
  introduces an artificial phase distinction (which hampers on-demand evaluation).
  Consequently, it increases code
  complexity severely. The goal of \thiswork{} is therefore to extend AGs such that
  inference algorithms for declarative type systems can be expressed in a natural way.

  %% Paragraph 3: explain that nondeterminism is a problem when backtracking is needed.
  Since an AG is both a functional specification and \emph{implementation}, our challenge is to
  describe algorithms that resolve declarative aspects.
  The declarative aspects in a semantics occurs in two forms:
  attributes with a non-functional definition
  (e.g. fresh types), and productions that are not syntax-directed, but where their applicability
  depends on values of attributes. These two aspects are mutually expressible (\stepRefAppendix{step.sect.mutuallyexpressible}{G}).
  As we prefer attributes to be functionally defined, we focus on the second notion.
  Typically, we can deal with declarative aspects using a unification-based approach, which integrates
  well with AGs~\citep{DBLP:conf/afp/DijkstraS04}. However, some applications require a more
  powerful approach, with an algorithm that \emph{actively} tries out alternatives at
  choice points.
  %% the term 'choice point' is defined somewhere. Anyway, should be clear from the context here as well.
  For example, in a
  Haskell compiler, to search for an equality proof for GADTs, and to resolve
  overloading in the presence of undecidable instances.
  Unfortunately, \emph{exploration of alternatives} does not fit AGs straightforwardly.

  %% Paragraph 4: explain why backtracking doesn't fit an AG
  Such explorations do not fit out of the box, because productions are selected based on
  the syntax (e.g. the parsed AST), and not on the values of attributes. To lift this
  restriction, there are approaches that generate Prolog~\citep{Walsteijn86, Arbab:1986:CCA:13831.13837}.
  However, we have several additional demands:
  \begin{enumerate}
  \item The approach needs to be compatible with any general purpose host language.
        From a theoretical perspective: to allow the extension we propose to be
        integrated in other AG systems as well;
        from a practical perspective: we have a large Haskell code base, thus want
        to use Haskell as a host language.

  \item We need a \emph{breadth-first evaluation} with \emph{online results} to deal with
        potential infinite ASTs. Online results are also needed for integration with
        conventional on-demand evaluation.

  \item We need complex disambiguation strategies. In general, for deterministic results
        and error reporting; in particular, to deal with some extensions to type classes
        that UHC offers~\citep{chrclass}.
  \end{enumerate}
  In \thiswork{}, we present an approach that adds exploration of alternatives to AGs and
  meets the above demands.

  %% Paragraph 5: finally, the approach!
  Our approach consists of following three key ideas:
  \begin{enumerate}
  \item
    We encode declarative aspects and their resolution as a \emph{choice function}
    between children of a production.
    Overall, attributes are evaluated using conventional on-demand evaluation.
    Attributes of children participating in a \emph{choice}, however, are evaluated
    \emph{strictly}.
  \item
    We annotate productions such that they \emph{yield} user-defined progress reports that contain
    \emph{intermediate results} upon strict attribute evaluation.
    After yielding a progress report, evaluation for a child \emph{pauses}.
    This allows the choice function to evaluate its children in a \emph{step-by-step fashion}
    until it has observed sufficient progress reports to commit to one of the children.
    It may yield progress reports itself in the mean time.
%%    We add additional built-in nonterminals to a production to annotate it. Such a built-in nonterminal
%%    takes user-defined progress reports as inherited attribute, and yield it as side
%%    effect of greedy evaluation.
  \item
    As implementation strategy, we map each production to a special form of
    coroutine that can both be run greedily until it yields the next progress
    report, and be run to completion directly but with a lazy result.
    This implementation facilitates a \emph{hybrid evaluation model} between stepwise
    and on-demand evaluation, and is therefore compatible with other AG techniques
    that depend on the conventional on-demand evaluation.
  \end{enumerate}
  We motivate these ideas via an example in Section~\ref{step.sect.stepwiseexample}.

  %% Paragraph 6: a bit of history (Doaitse's paper) and a bit of future (forward references)
  These ideas are inspired by a parsing technique by \citet{DBLP:conf/lernet/Swierstra08, DBLP:conf/icfp/HughesS03}
  to explore alternatives based on progress information reported by parsers.
  %% It allows powerful disambiguation strategies and error correction. The mechanism meets the
  %% above demands, albeit in the context of parsing.
  The approach in \thiswork{} is not directly related to parsing, but ultimately has
  its roots in the above technique (for a detailed comparison, cf. Section~\ref{step.sect.relatedwork}).

  In the text, we refer to appendices published in an accompanying technical report~\citep{middelkoop10stepwise}
  where certain topics are explored in more detail.
  The main contributions of \thiswork{} are:
  \begin{itemize}
  \item We define SAGs, a language extension to Attribute Grammars
        that copes with declarative aspects in a semantics, while keeping
        the AG purely functional.
        We explain SAGs in Section~\ref{step.sect.stepwiseexample},
        and show how to translate them to Haskell as a host language in
        in Section~\ref{step.sect.compilestepwise}.
        We implemented this extension in the UUAG
        system~\citep{uuagc}.
  \item We introduce lazy coroutines, or \emph{stepwise computations}, for which we provide
        an reference implementation (Section~\ref{step.sect.stepwisemonad}).
        In \thiswork{}, we focus on the main ideas.
        The Haskell library\footnote{
        Stepwise monad: \url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/stepwise-1.0.2.tar.gz}}
        shows and explains many additional features that are useful in practice
        (see also Section~\ref{step.sect.evaluation}).
        The library may also be used by Haskell programs that need
        powerful exploration capabilities, but are not related to AGs.
  \item As a proof of concept that these techniques are portable to other languages
        as well, we also provide an implementation of the example in
        Section~\ref{step.sect.stepwiseexample} in Java (Section~\ref{step.sect.imperative}).
  \end{itemize}


%if fullVersion
\section{Example of a Stepwise AG for a Predicate Language}
%else
\section{Example of a Stepwise AG}
%endif
\label{step.sect.stepwiseexample}

  In this section, we take as running example an operational semantics\footnote{
  For an example related to type systems, see \stepRefAppendix{step.sect.inferencerulesandags}{F}.
  }
  for a Boolean predicate language |Pred|.
  We show how to write |Pred|'s semantics as an AG using the notation as supported by the UUAG system.
  The semantics is executable: its implementation yields an interpreter in Haskell for |Pred|.
  Initially, the example is a functional specification, such that we can resort to a conventional
  AG and explain the notation involved~\citep{uuagc}.
  Next we show that the semantics has an efficiency problem which we can solve by making the
  specification more declarative. With a Stepwise AG we then deal with it.

  \subsection{Syntax of the Predicate Language}
  \label{step.sect.predsyntax}

  Consider the following grammar for the abstract syntax of |Pred|:

%if fullVersion
  \begin{code}
  data Pred  -- Grammar for nonterm |Pred| as algebraic data type.
    | Var  nm     :: String          -- Variable with value in |Env|.
    | Let  nm     :: String          -- Binds (non-recursively) a
           expr   :: Pred            -- name to the value of |expr|,
           body   :: Pred            -- in scope of |body|.
    | And  left   :: Pred            -- The logical |&&| of two preds.
           right  :: Pred
    | Or   left   :: Pred            -- The logical |##| of two preds.
           right  :: Pred

  type Env = Map String Bool         -- Environment that maps names to Booleans.
  \end{code}
%else
  \begin{code}
  data Pred  -- Grammar as algebraic data type.
    | Var  nm     :: String          -- Variable in |Env|.
    | Let  nm     :: String          -- Binds (non-recursively)
           expr   :: Pred            -- name to value of |expr|,
           body   :: Pred            -- in scope of |body|.
    | And  left   :: Pred            -- The logical |&&|.
           right  :: Pred
    | Or   left   :: Pred            -- The logical |##|.
           right  :: Pred

  type Env = Map String Bool         -- Maps names to Bools.
  \end{code}
%endif

  The data declaration
  introduces a nonterminal |Pred|, and a number of productions |Var|, |Let|, etc. The fields of the
  production comprise the symbols of the RHS of the production, consisting of a name |nm|, |expr|, etc.
  and a type. Some built-in types such as |Bool| and |String| specify that the symbol is a terminal,
  otherwise the symbol is a nonterminal.

  From the grammar, we generate constructor functions |sem_Var|, |sem_Let|, etc. which are used
  to build attributed ASTs. Given an initial environment |{"f" -> False, "t" -> True}|, which binds a truth value
  to the free variables in the predicates, we turn a predicate into a proposition.
  For example:
%if fullVersion
  \begin{code}
  taut   =  sem_Or (sem_Var "t") (sem_Var "f")        -- |True|
  contr  =  sem_And (sem_Var "t") (sem_Var "f")       -- |False|
  alias  =  sem_Let "a" (sem_Var "t") (sem_Var "a")   -- |True|
  big1   =  sem_And (sem_Var "f") big1                -- |False|
  big2   =  sem_And big2 (sem_Var "f")                -- |False|
  \end{code}
%else
  \begin{code}
  taut   =  sem_Or (sem_Var "t") (sem_Var "f")        -- |T|
  contr  =  sem_And (sem_Var "t") (sem_Var "f")       -- |F|
  alias  =  sem_Let "a" (sem_Var "t") (sem_Var "a")   -- |T|
  big1   =  sem_And (sem_Var "f") big1                -- |F|
  big2   =  sem_And big2 (sem_Var "f")                -- |F|
  \end{code}
%endif

  Informally speaking, the Boolean value of |taut| is |True| and of |contr| is |False|. The
  |big1| and |big2| predicates are large sequences of |False|
  that are combined with |and|s. In fact, these sequences are infinitely long, although
  we only use that for emphasis.
  Their truth value is |False|.
  Formally, however, we have to define an operational semantics to make such statements.

  \subsection{Deterministic Operational Semantics}

  An operational semantics for a predicate takes an environment and provides a
  truth value. We model these two aspects as attributes, which correspond to values
  attached to the nodes of an AST:
  an \emph{inherited} attribute |env| that represents the environment for the subtree,
  and a \emph{synthesized} attribute |val| that denotes the value of the subtree (for the given environment):
%if fullVersion
  \begin{code}
  attr Pred  inh env :: Env  syn val :: Bool  -- Typed attributes for nonterm |Pred|.
  \end{code}
%else
  \begin{code}
  attr Pred  inh env :: Env  syn val :: Bool
  \end{code}
%endif

  The obligation to define an inherited attribute for a node in the AST
  lies with the parent, for
  a synthesized attribute it lies with a child.
  Via a |sem|-block, we define for each production,
  the production's synthesized
  attributes, and the inherited attributes of its children using rules written in the AG's
  host language (in our case: Haskell expressions). These rules may refer to the
  inherited attributes of the production and the synthesized attributes of the children.
  We refer to an attribute using |chld.atname| notation. To refer to the attributes of the
  production itself, we use the special name |lhs|. We refer to terminals by
  their name. Thus, we define the semantics for predicates as:
%if fullVersion
  \begin{code}
  sem Pred  -- Specifies rules for productions of nonterm |Pred|.
    | Var  lhs.val    = lookup nm lhs.env

    | Let  lhs.val    = body.val  -- Takes |val| from child |body|.
           body.env   = insert nm expr.val lhs.env

    | And  lhs.val    = left.val && right.val
           left.env   = lhs.env   -- Copies down |env| to the left.
           right.env  = lhs.env   -- Copies down |env| to the right.

    | Or   lhs.val    = left.val || right.val
           left.env   = lhs.env
           right.env  = lhs.env
  \end{code}
%else
  \begin{code}
  sem Pred  -- Specifies rules for productions.
    | Var  lhs.val    = lookup nm lhs.env

    | Let  lhs.val    = body.val  -- Takes |val| from |body|.
           body.env   = insert nm expr.val lhs.env

    | And  lhs.val    = left.val && right.val
           left.env   = lhs.env   -- Copies down |env|.
           right.env  = lhs.env   -- Copies down |env|.

    | Or   lhs.val    = left.val || right.val
           left.env   = lhs.env
           right.env  = lhs.env
  \end{code}
%endif
  Thus, we simply pass down the environment from top to bottom. For a variable, we lookup the associated
  value in the environment. For a let-binding, we insert the value in the environment. Finally, for
  the |And| and |Or|, we take the Haskell (short-circuiting) |(&&)| and |(##)| on Boolean values of the children.

  From this AG, the UUAG compiler generates an interpreter that takes a predicate, defines the root's |env| attribute,
  and demands a value for root's |val| attribute. On-demand evaluation proceeds to compute those attributes
  when their values are needed during the computation of the |val| attribute.
  In a purely functional language such as Haskell, we can represent a decorated tree as a function from inherited
  to synthesized attributes~\citep{Saraiva99}. More precisely, for each production
  (e.g. |Var|, |And|), we generate a function (|sem_Var|, |sem_And|, respectively) that, given
  the functions corresponding to its children, represents a Haskell function takes values for inherited
  attributes as parameter, and returns a product with values for the synthesized attributes
  (Section~\ref{step.sect.compilestepwise}).

  When we run the interpreter on the examples given earlier, it gives
  the expected outcomes, with a single exception: the computation for |big2| diverges.
  The reason is that both |(##)| and |(&&)| start with the left argument first. This may involve a lot
  of computation (e.g. in case of |big2|) that could be avoided by looking at the second argument first.
  Then, however, |big1|'s evaluation would diverge.
  If |big2| would be a long but finite sequence, then its evaluation would not diverge
  but take a long time to produce an answer.

  \subsection{Declarative Operational Semantics}

  The |&&| and |##| operators do not distribute evaluation over their operands well.
  To give more freedom with respect to the evaluation order, we could add non-determinism to the
  specification, for example via multiple (conditional) interpretations of a production, using
  the following (fictional) notation:
  \begin{code}
  sem Pred  -- Productions with conditional alternatives.
    | And1  lhs.val  = left.val   ^^^  when left.val   = False
    | And2  lhs.val  = right.val       when right.val  = False
    | And3  lhs.val  = left.val        when otherwise

    | Or1   lhs.val  = left.val        when left.val   = True
    | Or2   lhs.val  = right.val  ^^^  when right.val  = True
    | Or3   lhs.val  = left.val        when otherwise
  \end{code}
  In this specification, there is freedom in the alternative to apply.
  A clever AG evaluator could use
  a breadth-first exploration of alternatives combined with prioritizing those attributes that are
  closer to the root and thus provide a more balanced evaluation strategy.

  However, we want to be precise in this strategy. When both |left.val| and
  |right.val| are |True|, the choice between |Or1| and |Or2| is ambiguous.
  For predictability reasons (and \emph{referential transparency}), it must be clear which one should be taken; also,
  we may prefer one over the other based on some other available information.
  In some cases we may want to be biased towards a particular subtree, potentially based on results
  computed at runtime. Hence, as mentioned in Section~\ref{step.sect.introduction}, we want to be able to define this
  strategy ourselves.

  \subsection{Stepwise Operational Semantics}

  To define such a strategy, we take a less ad-hoc solution. In particular, we
  encode alternatives as a choice between children instead of productions. We do not loose expressiveness
  (\stepRefAppendix{step.sect.additionaltreestructure}{E}),
  and gain the ability to define strategies in terms of the evaluation of children.
  In the remainder of \thiswork{}, we focus on |Or|-predicates only, and leave the strategies
  for |And|-predicates to the reader.
  For the |Or|-predicate, if we know that one of the children's |val| attribute evaluates to |True|,
  we can commit the choice to that child. We express this as follows, using a function
  |choose_or| (to be defined later):
%if fullVersion
  \begin{code}
  sem Pred | Or  -- Rules for production |Or| of nonterm |Pred|.
    left.env   = lhs.env                  -- Copies down |env| to the left.
    right.env  = lhs.env                  -- Copies down |env| to the right.

    merge left right as res = choose_or   -- Creates child |res| using |choose_or| to merge.

    lhs.val    = res.val                  -- Pass |val| up from child |res|.
  \end{code}
%else
\begin{code}
sem Pred | Or  -- Rules for production |Or|.
  left.env   = lhs.env                  -- Copies down |env|.
  right.env  = lhs.env                  -- Copies down |env|.

  merge left right as res = choose_or   -- Creates child |res|.

  lhs.val    = res.val                  -- Pass |val| up.
\end{code}
%endif
  The merge-rule transforms children |left| and |right| into a single virtual child |res|.
  We may refer to the synthesized attributes of |res|, but not to those of |left| and |right|.
  Similarly, we need to define the inherited attributes of |left| and |right|, but may not
  define those of |res|.

  The function |choose_or| takes the synthesized attributes of |left| and |right| as arguments, and
  is required to provide values for the synthesized results for |res|\footnote{Due to
  Haskell's laziness, on-demand evaluation starts as soon as we scrutinize a value of an attribute.}.
  As initial attempt, we define |choose_or| as:
%if fullVersion
  \begin{code}
  choose_or :: Bool -> Bool -> Bool
  choose_or left_val right_val   -- Synthesized attributes of |left| and |right| as parameter.
    | left_val   = left_val      -- Takes the left value, or
    | right_val  = right_val     -- takes right value, or
    | otherwise  = left_val      -- otherwise takes the left value.
  \end{code}
%else
  \begin{code}
  choose_or :: Bool -> Bool -> Bool
  choose_or left_val right_val   -- Syns children as param.
    | left_val   = left_val      -- Takes left , or
    | right_val  = right_val     -- takes right, or
    | otherwise  = left_val      -- otherwise takes left.
  \end{code}
%endif
  With this function, we unambiguously specify what kind of solution we want. However, in terms of
  evaluation we are back where we started: we evaluate the entire
  left child first, thus evaluation is still left biased.

  Scrutinizing the value of a synthesized attribute leaves us little control over the
  scheduling of the evaluation\footnote{
    We can use Haskell's lazy evaluation to return lazy approximations of the final
    values as result. This, however, complicates the AG severely, as rules must be manually
    lifted to operate on approximations.
  }. In the example, we cannot return a result until we make
  the choice, but in order to do so, we need to inspect the result, which causes one of
  the children to evaluate fully before we have a chance to inspect the other one.
  Instead, we want the scheduling decisions to be based
  on explicitly indicated intermediate results. On-demand evaluation does not help us here,
  and therefore we propose a different evaluation scheme to choose between alternatives.

  This leads us to the second idea of \thiswork{}: we evaluate a child under a choice
  strictly (instead of on-demand), thus computing
  attributes of children in a fixed order\footnote{In \thiswork{}, we take the order of
  appearance of children in a production as strict evaluation order. UUAGC actually
  supports Ordered Attribute Grammars (OAGs)~\citep{DBLP:journals/acta/Kastens80} that
  takes attribute dependencies into account, and ensures that all attributes are
  well-defined (non-cyclic). Using OAGs, the order of appearance is not important.
  }.
  Instead of completely evaluating all synthesized attributes of such child, however, we evaluate
  it just far enough to yield a progress report. It then suspends to be resumed later.
  We explain later how to emit such progress reports during evaluation.
  To explore or merge two children both gradually and simultaneously, as well as report intermediate
  results to the parent, we take their progress reports alternatively, and intertwine them
  (explained below).

  In our example, we wish to prioritize evaluation to the child that we estimate has
  performed the least amount of work, to balance out evaluation. For that, the progress reports
  need to give an indication that some work has been done. Hence, we define a type
  for progress reports (e.g. |Info|), with a plain constructor |Work| as possible value:
%if fullVersion
  \begin{code}
  data Info = Work  -- Application-specific type defined by programmer.
  \end{code}
%else
\begin{code}
data Info = Work  -- Application-specific type.
\end{code}
%endif

  The value for a child that is passed to the |choose_or| function is not just the value of the
  synthesized attribute, but now a \emph{stepwise computation} of the type |Stepwise Info Bool|,
  where |Info| is the type of the progress reports, |Bool| is the type of the
  synthesized attribute. The type of |choose_or| changes to:
  \begin{code}
  choose_or ::  Stepwise Info Bool  -> Stepwise Info Bool
                  ^^^  -> Stepwise Info Bool
  \end{code}
  A computation of the type |Stepwise Info Bool| can be manipulated with a simple monadic API (Section~\ref{step.sect.stepwisemonad}):
  %if fullVersion
  \begin{code}
  emit       :: i -> Stepwise i ()                               -- Type |i| equals |Info| here.
  smallStep  :: Stepwise i alpha -> Report i alpha               -- Evaluates to next report.
  data Report i alpha = Done alpha | Step i (Stepwise i alpha)   -- Result of |smallStep|.

  return  :: alpha -> Stepwise i alpha
  (>>)   :: Stepwise i alpha ->  Stepwise i beta              -> Stepwise i alpha
  (>>=)  :: Stepwise i alpha ->  (alpha -> Stepwise i alpha)  -> Stepwise i alpha
  \end{code}
  %else
  \begin{code}
  emit       :: i -> Stepwise i ()
  smallStep  :: Stepwise i alpha -> Report i alpha
  data Report i alpha = Done alpha | Step i (Stepwise i alpha)

  return  :: alpha -> Stepwise i alpha
  (>>)   :: Stepwise i alpha ->  Stepwise i beta              -> Stepwise i alpha
  (>>=)  :: Stepwise i alpha ->  (alpha -> Stepwise i alpha)  -> Stepwise i alpha
  \end{code}
  %endif
  Function |emit| yields a progress report, and |smallStep| evaluates the computation
  until it is |Done| (with attribute values) or yields a |Step| (with continuation).
  Computations are composable via monadic operators. The monadic sequence (|m1 >> m2|) performs
  |m1|, throws away its result, then performs |m2| and delivers |m2|'s result. The monadic
  bind |m >>= f| performs |f| parameterized with the result of |m|.

  Using the above API, we now redefine |choose_or| as:
  %if fullVersion
  \begin{code}
  choose_or left right = choose' (smallStep left) (smallStep right) where
    choose' (Done v) _  = if v then left   else right  -- Choose.
    choose' _ (Done v)  = if v then right  else left   -- Choose.
    choose' (Step Work left') (Step Work right')       -- Both yielded a |Work|.
      = emit Work >> choose_or left' right'            -- Pass on the report.
  \end{code}
  %else
  \begin{code}
  choose_or left right =
    choose' (smallStep left) (smallStep right)
    where
      choose' (Done v) _  = if v then left   else right
      choose' _ (Done v)  = if v then right  else left
      choose' (Step Work left') (Step Work right')
        = emit Work >> choose_or left' right'
  \end{code}
  %endif
  Both children perform a step. When one of them evaluates to |Done|,
  we inspect its attribute and make a choice. By replacing the expression with
  either |left| or |right|, we eliminate the other choice.
  Otherwise, we emit a step
  that a bit of work has been done (for the current node), and retain
  the choice between the continuations of the children.
  This strategy effectively encodes a breadth-first exploration of the children.

  We emit a |Work| progress report for each |Var| node.
  To inject such reports, we extend the |Var| production with a special built-in nonterminal |Progress|\footnote{
  Alternatively, the additional child can be specified as a higher-order
  attribute using Higher-Order AGs~\citep{DBLP:conf/pldi/VogtSK89} (supported by UUAG),
  which does not require us to change the production.
  }\footnote{
    Actually, the |Progress| nonterminal can be implemented using a merge-rule (\stepRefAppendix{step.sect.progressordered}{A}).
  }. This nonterminal has a single inherited attribute named |info|, and no
  synthesized attributes. Since it has no synthesized attributes,
  a child of this nonterminal is never evaluated during on-demand evaluation.
  During strict evaluation, however, each child is evaluated. In that case,
  the implementation of |Progress| yields the progress report that it took as parameter\footnote{
    Visits to children may be omitted if the synthesized
    attributes of that child are not used. This is undesirable when the
    child may emit a progress report. In \stepRefAppendix{step.sect.progressordered}{A}, we explain that progress
    reports themselves can be considered to be a hidden attribute. Hence, visits to children that emit
    progress reports can never be omitted during stepwise evaluation. Also, referential transparency is
    preserved.
  }. This is exactly the behavior that we want. We are not interested in progress reports during on-demand evaluation,
  but the more we are interested in them during strict evaluation.
  %if fullVersion
  \begin{code}
  data  Pred  | Var   ^^^ report : Progress    -- Additional child of production |Var|.
  sem   Pred  | Var   ^^^ report.info = Work   -- Defines its inherited attribute.
  \end{code}
  %else
  \begin{code}
  data  Pred  | Var   ^^^ report : Progress    -- Additional child.
  sem   Pred  | Var   ^^^ report.info = Work   -- Defines its attr.
  \end{code}
  %endif
  The |info| attribute is defined by a conventional rule, thus as right-hand side, we have
  access to any intermediate result (not needed for this example).

  \subsection{Hybrid On-demand and Stepwise Evaluation}
  \label{step.sect.hybrid}

  The attribute rules are oblivious towards stepwise evaluation. They are still pure
  functions between attribute values, which is important to reason with AGs.
  Access to progress reports is only possible in merge functions.
  Additionally, this allows stepwise and on-demand evaluation to coexist.

  Stepwise evaluation is strict, therefore it cannot deal with
  attributes defined in a cyclic way. However, we only need stepwise evaluation while making
  a choice: once only one alternative is left, we can continue with on-demand evaluation.
  For that purpose, we provide a function:
  %if fullVersion
  \begin{code}
   lazyEval :: Stepwise i alpha -> alpha  -- Runs computation lazily, returns syn attributes.
  \end{code}
  %else
  \begin{code}
   lazyEval :: Stepwise i alpha -> alpha
  \end{code}
  %endif
  It takes a partially reduced child as parameter, ignores progress reports,
  and returns the attributes (|alpha| is instantiated to a product of the attribute types) on which we may perform conventional on-demand
  evaluation. Further details for |lazyEval| can be found in Section~\ref{step.sect.stepwisemonad}.

  The global picture is that we start with |lazyEval| at the root of the
  AST. When |lazyEval| needs an attribute value of a merged
  child, it asks for the result of the merge function (e.g. |choose_or|).
  Consequently, |choose_or| invokes |smallStep| on its children,
  gradually reducing the candidate children, and ultimately returns one of
  the residual children (e.g. |left|). Then |lazyEval| proceeds with
  this child.

  To implement these ideas, we arrive at the third idea of \thiswork{}: The AG is compiled
  to a special form of coroutines (Section~\ref{step.sect.compilestepwise}).
  A coroutine is a function that may yield an
  intermediate result and then suspends. Its caller receives that result, and can
  reinvoke the coroutine again to resume its execution. This gives us the behavior
  of |smallStep|. For |lazyEval|, we need some special behavior: in that case, a
  coroutine resumes and runs immediately towards its end. It constructs only the
  administration needed to perform the remaining computations in an on-demand fashion.
  We give an implementation for these special coroutines in
  Section~\ref{step.sect.stepwisemonad}.

\section{SAG Translation}
\label{step.sect.compilestepwise}

  A SAG is a conservative extension of an AG that adds the |merge| rule. We sketched
  its static semantics in the previous section. In this section, we sketch its
  denotational semantics: we describe how to map a SAG to a monadic Haskell program
  (Translation scheme in \stepRefAppendix{step.sect.translationscheme}{B}).
  The monad is defined in Section~\ref{step.sect.stepwisemonad}.
  The SAG translation is largely based on a conventional translation to Haskell, as implemented
  in UUAG~\citep{Saraiva99}.

  To translate a SAG, we translate each production to a coroutine (e.g. |sem_And|), implemented as a monad.
  With the coroutines we build an attributed tree (e.g. |taut| and |big1| in Section~\ref{step.sect.predsyntax}).
  This tree is represented as a \emph{function} from the inherited attributes to a product of the synthesized
  attributes (and, as mentioned in the previous section, lifted in the stepwise monad). We call this tree
  the \emph{semantic value} or simply the \emph{semantics} (of the associated nonterminal/production).
  Thus, a coroutine is a function that takes the \emph{semantic values} (or, simply: \emph{semantics}) of
  its children as parameter, and returns its own semantics.
  For example, in case of nonterminal |Pred|, the type of its semantics (named |I_Pred|) and the signature
  of |sem_And| are:
  %if fullVersion
  \begin{code}
  type I_Pred = Env -> Stepwise Info Bool  -- Function type for attributed tree.
  sem_And :: I_Pred -> I_Pred -> I_Pred    -- Coroutine for |And| production.
  \end{code}
  %else
  \begin{code}
  type I_Pred = Env -> Stepwise Info Bool  -- Func type.
  sem_And :: I_Pred -> I_Pred -> I_Pred    -- Coroutine.
  \end{code}
  %endif

  We first discuss a translation of conventional AGs (|sem_And| does not use |merge|).
  Coroutine |sem_And| takes the semantics of its children as parameter. It must return its own
  semantics, which is a function from its single inherited attribute |env| to its single synthesized
  attribute |val| lifted in the stepwise monad. We use the encoding |childXattr| to unambiguously
  refer to an attribute, where |X| equals |I| for an inherited attribute and |S| for a synthesized attribute.
  The function body---the monad---comprises the \emph{plan} for a
  production: it consists of the calls to the children, and definitions of the attributes.
  At a function call to a child, we pass its inherited attributes, then we obtain a monadic value with the
  synthesized attributes, which we can match against.
  Recursive do-notation~\citep{DBLP:conf/icfp/ErkokL00} allows us to write such plans concisely.
  \newcommand\figmonadictree[0]{
    %if fullVersion
    \raisebox{0em}[0pt][0pt]{
    %endif
    \begin{tikzpicture}
      [ nd/.style={circle, minimum size=3mm, thick, draw=black,font=\footnotesize}
      , nddash/.style={nd,dashed}
      , ndlab/.style={font=\footnotesize}
      , arr/.style={->,dashed, very thick}
      ]

      \node[nd](n1){}
        [sibling distance=10mm, level distance=6mm]
        child {
          node[nd](n2){}
            child {
              node[nd](n3){}
            }
            child {
              node[nddash](n4){}
            }
        }
        child {
          node[nddash](n5){}
        };

    \node[ndlab,below of=n1,yshift=7mm]{|>>=|};
    \node[ndlab,below of=n2,yshift=7mm]{|>>=|};
    \node[ndlab,left of=n1,xshift=3mm]{parent |f1|};
    \node[ndlab,left of=n2,xshift=3mm]{parent |f2|};
    \node[ndlab,left of=n3,xshift=3mm]{active |m|};
    \node[ndlab,right of=n4,xshift=-3mm]{pending};
    \node[ndlab,right of=n5,xshift=-3mm]{pending};
    %% \node[ndlab,right of=n1,xshift=-4mm]{|return|};
    %% \node[ndlab,right of=n2,xshift=-4mm]{|return|};
    %% \node[ndlab,right of=n3,xshift=-4mm]{|return|};
    \end{tikzpicture}
  %if fullVersion
  }
  %endif
  }
  %if fullVersion
  \begin{code}
  sem_And left right = \lhsIenv -> do rec   -- Takes inh attr |env|.
    leftSval   <- left leftIenv             -- Calls |left| child.
    rightSval  <- right rightIenv           -- Calls |right| child.

    let leftIenv   = lhsIenv
    let rightIenv  = lhsIenv
    let lhsSval    = leftSval && rightSval
    return lhsSval                          {-" \figmonadictree{} "-}
  \end{code}
  %else
  \begin{code}
  sem_And left right = \lhsIenv -> do rec   -- Takes inh attr.
    leftSval   <- left leftIenv             -- Calls |left|.
    rightSval  <- right rightIenv           -- Calls |right|.

    let leftIenv   = lhsIenv
    let rightIenv  = lhsIenv
    let lhsSval    = leftSval && rightSval
    return lhsSval
  \end{code}
  %endif
  Under the hood, the do-notation reorders the statements to produce a linearized plan as a sequence of monadic binds\footnote{
    In general, also calls to |mfix| are inserted to deal with cyclic
    definitions. We provide a definition for |mfix| in the library associated to \thiswork{}. In practice, with UUAG, we use
    Ordered Attribute Grammars, that result in a more sophisticated translation that only
    needs non-recursive do-notation, without |mfix|.
  }, e.g. (let-bindings omitted, replaced by dots):
  %if fullVersion
  \begin{code}
  left leftIenv >>= (\ ... -> right rightIenv >>= (\ ... -> return lhsSval))
  \end{code}
  %else
  \begin{code}
  left leftIenv >>= (\ ... ->
    right rightIenv >>= (\ ... ->
      return lhsSval))
  \end{code}
  %endif
  A monadic bind |m >>= f|, expresses that the remainder of the plan |f|
  is parametrized with the results of plan |m|
  %if fullVersion
   (see the figure above).
  %else
  (see the figure below).
  \begin{center}
  \figmonadictree
  \end{center}
  %endif
  Strict evaluation goes from left to right, thus reducing a plan gradually.

  To translate the merge rule, consider the translation for |sem_Or|.
  The calls to the involved children are not made part of the plan:
  we do not match against their results.
  Instead, |choose_or| takes the plans (stepwise computations) of its input children, and
  as result, must provide a plan (computation) for its output child. Subsequently,
  we match against the output child to obtain the synthesized attribute |res.val| of merged child |res|:
%if fullVersion
  \begin{code}
  sem_Or left right = \lhsIenv -> do rec  -- Function from |lhs.env| to |lhs.val|.
    resSval  <- choose_or (left leftIenv) (right rightIenv)  -- Translation for merge.

    let leftIenv   = lhsIenv  -- Defines |env| for |left|.
    let rightIenv  = lhsIenv  -- Defines |env| for |right|.
    let lhsSval    = resSval  -- Takes |val| attribute from |res| child.
    return lhsSval            -- Returns result for |Or| production.
  \end{code}
%else
  \begin{code}
  sem_Or left right = \lhsIenv -> do rec
    resSval  <- choose_or (left leftIenv) (right rightIenv)

    let leftIenv   = lhsIenv  -- Defines |env| for |left|.
    let rightIenv  = lhsIenv  -- Defines |env| for |right|.
    let lhsSval    = resSval  -- Takes |val| attribute from |res|.
    return lhsSval            -- Returns result for |Or|.
  \end{code}
%endif

  SAGs are thus only a modest extension to the conventional translation.
  Most of the additional complexity is hidden in the implementation of the
  coroutine (Section~\ref{step.sect.stepwisemonad}).

%if fullVersion
\section{Lazy Coroutines and the Stepwise Monad}
%else
\section{Lazy Coroutines}
%endif
\label{step.sect.stepwisemonad}

  We use a coroutine to represent the residual attributed tree after strict evaluations.
  We either transform this coroutine to its lazy result via |lazyEval|, or run it greedily
  using |smallStep| to to the point where it yields the next progress report.
  The data type |Stepwise i a| represents such a coroutine. It exposes the following structure:

  %if fullVersion
  \begin{code}
  data Stepwise i a where   ^^^  -- |Stepwise| is a `defunctionalized' monad.
    Yield    :: i -> Stepwise i a  -> Stepwise i a  -- Paused computation
    Fail     :: String             -> Stepwise i a  -- Aborted computation
    Return   :: a                  -> Stepwise i a  -- Finished computation
    Pending  :: Stepwise i b -> Parents i b a  -> Stepwise i a

  data Parents i a b where       -- Parent stack (root of type |b|, active child |a|).
    None  :: Parents i a a       -- Bottom of the stack.
    Bind  :: (a -> Stepwise i z) -> Parents i z b -> Parents i a b
  \end{code}
  %else
  \begin{code}
  data Stepwise i a where   ^^^  -- `defunctionalized' monad.
    Yield    :: i -> Stepwise i a  -> Stepwise i a  -- Paused
    Fail     :: String             -> Stepwise i a  -- Aborted
    Return   :: a                  -> Stepwise i a  -- Finished
    Pending  :: Stepwise i b -> Parents i b a  -> Stepwise i a

  data Parents i a b where       -- Parent stack.
    None  :: Parents i a a       -- Bottom of the stack.
    Bind  :: (a -> Stepwise i z)  -> Parents i z b
                                  -> Parents i a b
  \end{code}
  %endif
  |Yield| means that the coroutine paused to yield a progress report of the type |i|. The
  second component represents the continuation. |Fail| represents an aborted computation, and
  |Return| means it succeeded, providing a value of type |a|.

  During strict evaluation, reduction of a child starts only when its preceding sibling is
  fully reduced (linear execution of the plans in Section~\ref{step.sect.compilestepwise}).
  Hence, every node has at most one child active, and a continuation of what
  to do after that child is finished. A |Pending| value encodes this: the first
  component is the deepest child awaiting further evaluation. The second component is the
  stack of parent-continuations. The GADT |Parents i a b| represents the parent nodes that
  await a value of type |a|, to ultimately compute a value of type |b|. When we match
  |None|, we reached the bottom of the stack, where we ensure that |b| equals |a|.

  We can now give a |Monad| instance for |Stepwise|. The bind |m >>= f| is encoded as
  active child |m| with single pending parent |f|:
  %if fullVersion
  \begin{code}
  instance Monad (Stepwise i) where     ^^^  -- Via the monad combinators, we
    return   = Return                        -- thus build a Stepwise-value, and
    fail     = Fail                          -- reduce this value via
    m >>= f  = Pending m (Bind f None)  ^^^  -- |smallStep| or |lazyEval|.

  emit i = Yield i (return ())
  \end{code}
  %else
  \begin{code}
  instance Monad (Stepwise i) where
    return   = Return
    fail     = Fail
    m >>= f  = Pending m (Bind f None)

  emit i = Yield i (return ())
  \end{code}
  %endif

  Given a coroutine, we can run it immediately to its lazy result value.
  This process describes how we transform a residual tree into a tree on which
  we can perform on-demand evaluation.
  We step over any progress reports it may yield in the process.
  When we encounter a |Pending|, we apply |lazyEval| to get a lazy result of
  the child, and use |evalPending| to provide it to its parent, which in
  turn passes it to its parent, until we reach the root:
  %if fullVersion
  \begin{code}
  lazyEval :: Stepwise i a -> a           -- Interprets computation lazily.
  lazyEval  (Yield _ m)    = lazyEval m   -- Skips progress report.
  lazyEval  (Fail s)       = error s      -- Interpreted as |undefined|.
  lazyEval  (Return v)     = v
  lazyEval  (Pending m p)  = evalPending p (lazyEval m)

  evalPending :: Parents i a b -> a -> b    -- |lazyEval| on chain of parents.
  evalPending None        a  = a            -- Reached the root.
  evalPending (Bind f r)  a  = evalPending r (lazyEval (f a))
  \end{code}
  %else
  \begin{code}
  lazyEval :: Stepwise i a -> a
  lazyEval  (Yield _ m)    = lazyEval m   -- Skips report.
  lazyEval  (Fail s)       = error s      -- Interpreted as |undefined|.
  lazyEval  (Return v)     = v
  lazyEval  (Pending m p)  = evalPending p (lazyEval m)

  evalPending :: Parents i a b -> a -> b    -- eval stack.
  evalPending None        a  = a            -- Reached root.
  evalPending (Bind f r)  a  = evalPending r (lazyEval (f a))
  \end{code}
  %endif

  Given a coroutine, we can also run it greedily until it yields the next
  progress report. Either it fails, is finished, or is paused with the
  yielded information |i| and the continuation to resume it with:
  %if fullVersion
  \begin{code}
  data Report i a where                           -- Outcome of |smallStep|.
    Failed  :: String             -> Report i a   -- Aborted with message.
    Done    :: a                  -> Report i a   -- Finished with value.
    Step    :: i -> Stepwise i a  -> Report i a   -- Paused with user report.
  \end{code}
  %else
  \begin{code}
  data Report i a where  -- Outcome of |smallStep|.
    Failed  :: String             -> Report i a   -- Aborted.
    Done    :: a                  -> Report i a   -- Finished.
    Step    :: i -> Stepwise i a  -> Report i a   -- Paused.
  \end{code}
  %endif

  The function |smallStep| performs a strict reduction. Once it encounters
  a |Yield| it can stop and return a |Step|:
  %if fullVersion
  \begin{code}
  smallStep :: Stepwise i a -> Report i a   -- Evaluate until next report.
  smallStep (Yield i m)    = Step i m       -- Pause after a yield.
  smallStep (Fail s)       = Failed s
  smallStep (Return v)     = Done v
  smallStep (Pending m p)  = reduce m p     -- Continues with |m|, and possibly |p|.
  \end{code}
  %else
  \begin{code}
  smallStep :: Stepwise i a -> Report i a   -- Next report.
  smallStep (Yield i m)    = Step i m       -- Yield step.
  smallStep (Fail s)       = Failed s
  smallStep (Return v)     = Done v
  smallStep (Pending m p)  = reduce m p     -- Reduce stack.
  \end{code}
  %endif
  For a |Pending|, its result depends on the reduction of the active child. If it
  finishes without yielding a progress report, we pass the result to its
  parent and continue reducing the parent. If the active child itself
  turns out to be a |Pending|, we push its stack on the stack we already
  have, and continue reduction:
  %if fullVersion
  \begin{code}
  reduce :: Stepwise i a -> Parents i a b -> Report i b
  reduce (Yield i m)  r         = Step i (Pending m r)   -- Keeps residual parents.
  reduce (Fail s)    _          = Failed s
  reduce m  None                = smallStep m            -- No parents left.
  reduce (Return v) (Bind f r)  = reduce (f v) r         -- Proceeds with parent |f|.
  reduce (Pending m r') r       = reduce m $ push r' r   -- Concatenates stacks.

  push :: Parents i a b -> Parents i b c -> Parents i a c
  push None r         = r                    -- Appends to the bottom.
  push (Bind f r') r  = Bind f (push r' r)   -- Walk towards the bottom.
  \end{code}
  %else
  \begin{code}
  reduce :: Stepwise i a -> Parents i a b -> Report i b
  reduce (Yield i m)  r         = Step i (Pending m r)
  reduce (Fail s)    _          = Failed s
  reduce m  None                = smallStep m
  reduce (Return v) (Bind f r)  = reduce (f v) r
  reduce (Pending m r') r       = reduce m $ push r' r

  push :: Parents i a b -> Parents i b c -> Parents i a c
  push None r         = r
  push (Bind f r') r  = Bind f (push r' r)
  \end{code}
  %endif

  The merging of the parent stacks is important. The stack represents
  a whole subtree, with the active child on top, and the root of the
  subtree on the bottom. When we want to reduce this subtree one step,
  we can thus immediately reduce the active child without having to
  traverse through the parents.

  We provide the API that we discussed in this section as a Haskell library.
  Its implementation satisfies the monad laws (\stepRefAppendix{step.sect.proofs}{H})
  for both |lazyEval| and |smallStep| evaluation.
  Furthermore, if it holds that |(sup smallStep (*)) m = Done v| then
  also |lazyEval| yields this value |lazyEval m = v|.
  The converse is not necessarily true: |lazyEval (undefined >> return ()) = ()|,
  whereas |(sup smallStep (*)) (undefined >>= return ()) = undefined|.
  However, when the AG is ordered, and each rule well-founded, then also the converse holds.

\section{Imperative Implementation}
\label{step.sect.imperative}

  The reference implementation that we presented in the previous section relies on several Haskell features, such as
  lazy evaluation, monads and GADTs. Despite that, the approach is portable to imperative languages and
  AG systems for these languages.
  As a proof of concept, we ported the example of Section~\ref{step.sect.stepwiseexample} to Java (\stepRefAppendix{step.sect.java}{J}),
  and implemented a small Stepwise support library\footnote{
  Stepwise Java example: \url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/jstepwise.jar}}.

  We encode demand-driven AGs as conventional for object oriented languages. AST nodes are represented by
  objects, using subclasses for productions. Attributes are encoded as fields on such nodes using getters and setters together
  with lazy initialization. We map each rule to a |Runnable| object. A rule is associated with one or more
  attributes. When the value of an attribute is not yet defined, the associated rule is executed, and
  the rule defines these attributes using side effect. A rule may refer to the values of other attributes, thus
  driving on-demand evaluation. Executing a rule twice has no effect.

  Additionally, a node exposes a visit method that encodes the coroutine for strict evaluation. The visit method
  may be invoked multiple times, and returns either with progress information, or a pointer to a child node
  to evaluate first, or indicates that evaluation is done for the subtree. With each execution, the visit method
  executes some of the rules. Nodes can thus be evaluated strictly via the visit method, and on-demand by
  accessing the attributes directly.

  As discussed in Section~\ref{step.sect.compilestepwise}, children of a production come in two fashions:
  conventional children and merged children. Conventional children are conventional AST nodes, which expose
  inherited attributes. Merged children do not expose inherited attributes. To start evaluation and access
  the synthesized attributes from both types of children, a stepwise computation can be requested from a
  child. A stepwise computation is a coroutine-object that supports the |lazyEval| and |smallStep| operations of
  Section~\ref{step.sect.stepwisemonad}. It represents the evaluation of the subtree rooted by the child.

  The stepwise computation obtained from a conventional child |x| represents the stack of nodes under strict
  evaluation. A node on the stack has been partially evaluated strictly and is waiting for strict evaluation
  of nodes above it to complete before proceeding
  with strict evaluation. The child |x| is on the bottom of the stack. The active child is the top of the
  stack. For the |lazyEval| operation, the stepwise computation obtained from a conventional child directly
  returns |x|. This corresponds to the |evalPending| operation, except that in contrast to the functional
  implementation, we do not have to thread the lazy outcomes around because these are represented by
  pointers and side effect in the imperative implementation.
  For |smallStep|, strict evaluation proceeds with the active node. If it yields progress info, then
  these are returned as the report for |smallStep|, and the evaluation is effectively paused again.
  If evaluation of the active node is
  finished, the node is popped from the stack and evaluation continues with its parent. If an active
  node requests evaluation of a child first, these are pushed on the stack.

  The stepwise computation obtained from a merged child is an object with a visit method that
  has access to the stepwise computation of the children that are being merged. With each visit,
  it must return a progress report. In particular, it can report that the computation is to be
  replaced with a computation of the children, thereby resolving the choice. Essentially, such
  a computation is an imperative version of the monadic |choose_or|.

  The imperative implementation closely resembles the functional implementation, although more
  verbosely. A language with native support for coroutines would simplify the implementation
  slightly.

%if False
\begin{small}
\begin{small}\begin{verbatim}
protected void visit() {
  Report r1 = _left.nextStep();
  Report r2 = _right.nextStep();

  if (r1 instanceof ReportDone)
    commit(_left.lazyEval().syns().value().get() ? _left : _right);
  else if (r2 instanceof ReportDone)
    commit(_right.lazyEval().syns().value().get() ? _right : _left);
  else if (r1 instanceof ReportFail)
    commit(_right);
  else if (r2 instanceof ReportFail)
    commit(_left);
  else if (r1 instanceof ReportInfo && r2 instanceof ReportInfo)
    emit(new InfoWork());
}
\end{verbatim}\end{small}
\end{small}
 %endif

\section{Remarks}
\label{step.sect.evaluation}

  \subsection{Extensions}

  We used SAGs to rapidly prototype a type-directed transformation.
  %if False
  \footnote{
    Leather, et al. \emph{Towards Type-Safe, Type-Changing Program Improvement}, submitted to LDTA '11.
  }.
  %endif
  It required
  a small extension of the presented ideas: \emph{semantic lookahead} (\stepRefAppendix{step.sect.lookahead}{C}).
  Often, a choice for a subtree has
  consequences at another location in tree. We thus (may) need to
  investigate potential alternatives beyond the evaluation of the subtree. To this end, we added a mechanism
  to obtain the continuation, and investigate the steps coming out of the continuation.

  To combine progress reports of different types, and allow them to depend on the result type of the
  computation itself, we implemented \emph{transcoding} (\stepRefAppendix{step.sect.watchers}{D}). It can also
  be used to replace multiple reports by a single report (compression) to trade interleaving granularity
  with fewer reports to examine.

  Finally, we offer \emph{explicit sharing} via references. We use this mechanism to deal with the
  |MonadFix| instance required for recursive do-notation, and also to offer \emph{memoization}
  (to turn the AG under user-defineable conditions into a graph).

  \subsection{Benchmarks}

  We benchmarked our approach (\stepRefAppendix{step.sect.benchmarks}{I}) on a standard MacBook~2.1
  with a 2 GHz dual-core processor, 2 GB of main memory, and GHC~6.12.1.
  We compared the execution time of code generated the conventional way by UUAG against code that uses
  stepwise evaluation, and meassured the runtime overhead.
  The throughput of stepwise binds is about thirty times slower than the bind of the identity monad.
  The overhead is constant for |nextStep|, and is a bit more erratic for |lazyEval| (but comparable).
  On the other hand, the overhead is negligible in practice. We compared the compilation speeds of UHC, which
  makes heavy use of AGs (for both large and small tasks). The compilation time only marginally increased,
  and stays under random noise induced by garbage collection.

\section{Related Work}
\label{step.sect.relatedwork}

  \Thiswork{} is heavily inspired by uu-parsinglib~\citep{DBLP:conf/lernet/Swierstra08, DBLP:conf/icfp/HughesS03}.
  The parsing library supports both context-free and monadic grammars,
  and offers online results as well as error correction. It offers a data-type
  similar to our |Stepwise|. The essential difference is that uuparsing-lib's bind cannot
  yield a result until the LHS of the bind is fully step-wise evaluated\footnote{
    Formally: we can write a conventional AG as a lazy applicative functor. Monads are more expressive,
    hence we require the following equality to hold, which is not the case for uu-parsinglib:
    \begin{code}
    p <*> q ^^^ == ^^^ p >>= \f -> q >>= a -> return (f a)
    \end{code}
  }.
  Instead, required for the hybrid evaluation model, our implementation can yield a result when the
  RHS can do so (when using |lazyEval|).
  Also, uuparsing-lib's implementation manually manages stacks with semantic values computed so far
  (the outcome of the history parser), or semantic values still to come (the outcome of the future parser).
  Instead, in our implementation, these values are implicitly represented as local variables in closures.

  Our work is related to various disambiguation strategies.
  Some approaches allow ambiguous ASTs and impose syntactic restrictions to resolve these, e.g. by
  conditionally rejecting certain productions based on the AST structure, or prioritizing some
  productions over others~\citep{DBLP:conf/cc/BrandSVV02}.
  Other approaches generate a parse forest and filter later, potentially using semantic information~\citep{DBLP:conf/gpce/BravenboerVVV05}.
  In contrast, our approach does not require all alternatives to be available a priori, and works for also
  in case of infinite trees and in combination with nonterminal attributes.

  AGs are traditionally considered to select productions deterministically based on the syntax of a language.
  Conditional Attribute Grammars~\citep{DBLP:journals/toplas/Boyland96}, as supported by the
  FNC-2 system~\citep{DBLP:conf/saga/JourdanP91} and our experimental Ruler system~\citep{Middelkoop10gpce},
  allow multiple definitions for a productions guarded by conditions. These conditions need to be evaluated
  first, thus offer limited control over the exploration of alternatives.

  There are AG evaluators that generate Prolog code~\citep{Walsteijn86, DBLP:conf/plilp/Paakki91}. Such an
  approach that depends on a logic language is unacceptable for us, as it does not mix well with our
  existing Haskell code. In contrast, our approach can be implemented in an arbitrary general purpose
  host language. The lazy evaluation our Haskell implementation relies on, actually just represents
  on-demand evaluation that other AG approaches provide. Moreover, as sketched by Section~\ref{step.sect.hybrid},
  our approach is compatible with circular and remote AGs~\citep{DBLP:journals/scp/MagnussonH07}.

  There are several different techniques to deal with declarative aspects in the specifications of
  programming languages.
  We classify declarative sources in increasing complexity:
  \begin{itemize}
  \item \emph{Deterministic}.
        Both the production selection and the value of attributes are purely
        functional.
        A problem in this class trivially fits an AG.
        For example, auto-boxing and implicit coercions fall in this category.
  \item \emph{$k$ pass}.
        A priori unknown values (type variables) and derivations (deferred judgments)
        are replaced with place holders.
        Nodes in the AST are traversed at most $k$ times. A decision about a
        place holder must be taken during one of these traversals.
        Typically, information about place holders is maintained in a
        substitution (concrete assignments to variables) or constraints
        (symbolic representation of a deferred judgment).
        For example, Hindley-Damas-Milner type inference (algorithm W)
        has $k = 1$. After one
        traversal, a type variable either got assigned a concrete type, or it
        is fixed by a skolem constant (and generalized later).
        Haskell 98 overloading resolving is an example of $k = 2$. In the first pass,
        type equalities are resolved and class-constraints are collected. In the
        second pass, the class-constraints are resolved.
        Such problems can be dealt with in AGs using additional attributes for
        substitutions and constraints.
  \item \emph{$\omega{}$ pass}.
        Declarative aspects that are resolved by fixpoint iteration falls in this class.
        This includes the class of type and effect systems.
        Also, resolution of Haskell's overloading in combination with functional
        dependencies falls in this class.
        AGs with circular references can be used to encode
        such problems~\citep{DBLP:journals/scp/MagnussonH07,DBLP:journals/toplas/Jones90},
        or AGs that can express iteration~\citep{Middelkoop10gpce}.
  \item \emph{Exploration of alternatives}.
        In the previous classes, declarative aspects are resolved by sufficiently
        constraining a value, where the constraints are a pure function of the
        (attributed) AST. In the exploration class, declarative aspects are resolved
        by exploring assignments to place holders.
        This requires instantiations for place holders to be enumerable.
        Haskell's overloading combined with overlapping instances,
        and the construction of equality coercions for GADTs fall in this class.
        \Thiswork{} positions itself in this class.
  \item \emph{Undecidable}.
        Inference for some declarative aspects is undecidable.
        For example, a polymorphic type can in general not be inferred for
        an argument of a recursive function.
  \end{itemize}

  Haskell offers several library approaches for backtracking, via folklore |Maybe| and list monads to
  more advanced monads~\citep{Hinze:2000:DBM:351240.351258, DBLP:conf/icfp/KiselyovSFS05, DBLP:conf/icfp/FischerKS09} that
  deal with nondeterminism and lazyness. Alternatives are only explored for a value when this value is required.
  However, the order of appearance of alternatives affect memory retainment and how online the results are.
  See the discussion in
  %if thesis
  \citet[Section 4.1]{DBLP:conf/lernet/Swierstra08}.
  %else
  Section 4.1 of \citet{DBLP:conf/lernet/Swierstra08}.
  %endif

  Coroutines were considered for many compilation tasks~\citep{DBLP:books/sp/Marlin80}. Nowadays,
  they are mostly used to implement producer-consumer patterns.
  \citet{DBLP:journals/acta/Kastens80} showed
  how to compile multi-visit AGs to coroutines. In \thiswork{} we apply coroutines in a different fashion.
  We use coroutines to expose explicitly indicated parts of the internal state of an AG evaluation, in order
  to describe exploration strategies.

  There are several approaches for monadic coroutines in Haskell. These implementations have their roots
  in the folklore CPS monad to pause, abort and merge computations.
  Kiselyov's Iteratees~\citep{iterattees} come close to our implementation. However, technical differences
  aside, there is a conceptual difference. Typical coroutine implementations allow invocations to take
  additional arguments. Since the result may depend on the value of such a parameter, a lazy result cannot
  be given until the last invocation. Therefore, the hybrid evaluation model that we presented cannot be
  implemented with such coroutines.

\section{Conclusion}

  We presented SAGs, a powerful language extension to Attribute Grammars to cope with
  declarative aspects in the semantics of programming languages. We stated our requirements in
  Section~\ref{step.sect.introduction}, and showed how our approach meets these demands by
  example in Section~\ref{step.sect.stepwiseexample}, and sketched the implementation in
  Section~\ref{step.sect.compilestepwise} and Section~\ref{step.sect.stepwisemonad}.
  We implemented SAGs in the UUAG system.

  The idea central to our approach is to encode alternatives as a choice between children
  of a production, and resolve this based on stepwise inspection of intermediate results
  of these children in the form of progress reports.

  As future work, we intend to replace the overloading
  mechanism as currently implemented in UHC using the new AG features as presented in this
  paper.
  Also, a remaining question is if the
  stepwise monad is powerful enough to simplify the implementation of @uu-parsinglib@.


%if fullVersion

%if thesis
\begin{subappendices}
%else
\appendix
%endif

In \stepRefAppendix{step.sect.progressordered}{A}, \stepRefAppendix{step.sect.additionaltreestructure}{E}, and
\stepRefAppendix{step.sect.translationscheme}{B}
we go into more detail of the AG part of the story.
\stepRefAppendix{step.sect.lookahead}{C} and \stepRefAppendix{step.sect.watchers}{D} show improvements of the
stepwise monad.
\stepRefAppendix{step.sect.inferencerulesandags}{F} gives an example that makes use of the additional
improvements.
\stepRefAppendix{step.sect.mutuallyexpressible}{G} shows how various declarative aspects can
be expressed in terms of each other. Proofs for some of our claims related to the
monad laws can be found in \stepRefAppendix{step.sect.proofs}{H}, and benchmark results in
\stepRefAppendix{step.sect.benchmarks}{I}.
Finally, we show a translation to Java in \stepRefAppendix{step.sect.java}{J}.
The thesis contains appendices A-D. The extended edition contains the remaining appendices.

\section{Progress Reports and their Emission}
\label{step.sect.progressordered}

  As we mentioned in Section~\ref{step.sect.stepwiseexample}, we
  annotate productions with the built-in nonterminal |Progress| to yield progress reports.
  Actually, this built-in nonterminal is only a notational convenience: it is definable
  in terms of the merge-syntax that we presented before. In this section, we show the implementation.

  The data type |Progress|, as mentioned in \thiswork{}, can be implemented as follows:
  \begin{code}
  data Progress | Progress
  attr Progress inh info :: a
  sem Progress | Progress
    merge as res : Progress = emit lhs.info >> return ()
  \end{code}
  It has a single inherited attribute, and a single production. This production has a
  single child |res|, merged out of an empty set of children. To define
  the semantics of this child, we thus do not get any semantics of children as parameter.
  Since |Progress| does not have any synthesized attributes, defining a semantics for
  |res| is straightforward: we return the empty tuple |()|.

  It is not immediately clear why this implementation would work: nonterminals without
  any synthesized attributes never need to be visited. Referential transparency tells us
  that we may replace a child with a product of its synthesized attributes, and
  attribute references with the corresponding values. A child without synthesized attributes
  may never be evaluated, and the progress report never yielded.

  The essential realization here is that there is that a hidden attribute plays a role:
  the progress reports themselves are a purely functional attribute.
  Hence, during strict evaluation via |smallStep|, we still visit children without any explicitly declared
  synthesized attributes in order to get the progress reports. In contrast, during on-demand
  evaluation via |lazyEval|, we ignore progress reports, hence do not evaluate children without synthesized
  attributes.

\section{Translation Scheme}
\label{step.sect.translationscheme}

  In this section, we formalize SAGs.
  We first define a small core language |sagcore|, consisting of Haskell extended with
  embedded AG blocks, obtained by desugering AG descriptions.
  The following grammar lists the syntax of these embedded AG blocks:
  \begin{code}
    i  ::=  attr I inh (many a1) syn (many a2)              -- attribute delcs
    a  ::=  x :: hty                                        -- attribute decl, with Haskell type |hty|

    s  ::=  sem I (many r)                                  -- semantics expr, defines production for |I|

    r  ::=  p = e                                           -- binds to |p| to pure |e|
       |    child c :: I = e                                -- declares child
       |    merge (many c) as c :: I = e                    -- declares merged child

    o  ::=  x.x                                             -- expression, attribute occurrence

    x, I, p, e  -- identifiers, patterns, expressions respectively
  \end{code}
  Attribute declarations |i| declares all attributes (name and type) of a nonterminal |I|.
  A semantics block defines a single production for a nonterminal |I|, and gives its rules |many r|.
  Productions in |sagcore| are nameless: we use a Haskell declaration to give it a name. Furthermore,
  we declare its children through rules.
  Rules either define attributes, or declare children: we introduce all children as higher-order
  attributes (\stepRefAppendix{step.sect.additionaltreestructure}{E}). The expressions |e| are Haskell expressions,
  with possible attribute occurrences |o|. Patterns |p| are Haskell patterns, also with possible
  attribute occurrences |o|.

  For example, we show how the production |Or| as mentioned in Section~\ref{step.sect.stepwiseexample}
  is encoded in |sagcore|. In UUAG-notation, we declare a production |Or|, declare the attributes
  of the corresponding nonterminal, and give the rules for the attributes:
  \begin{code}
  data Pred | Or  left,right :: Pred
  attr Pred ^^^ inh e :: Env ^^^ syn b :: Bool
  sem Pred | Or
    left.e   = lhs.e
    right.e  = lhs.e
    merge left right as res :: Pred = choose_or
    lhs.b    = res.b
  \end{code}
  In |sagcore|, we declare the attributes of the nonterminal, then use a Haskell function to
  represent the production: it takes the semantics of the children as parameter, then uses
  an embedded semantics block to define the semantics for the production itself:
  \begin{code}
  attr Pred inh e :: Env syn b :: Bool
  sem_Pred_And l r =
    sem Pred
      child left  :: Pred  = l
      child right :: Pred  = r
      merge left right as res :: Pred = choose_or

      left.e   = lhs.e
      right.e  = lhs.e
      lhs.b    = res.b
  \end{code}
  We thus keep the rules, yet express the grammar directly as Haskell functions:
  \begin{code}
  data N | C  (many (c :: I))   ^^^ ~>  ^^^  sem_N_C (many c) = sem N
  sem N | C   (many r)          ^^^     ^^^  ^^^  (many (child c :: I = c)); (many r)
  \end{code}

  With such a function that represents a production, we can construct attributed trees.
  Each node in the tree has its own set of inherited and synthesized attributes:
  the associated nonterminal specifies their name and their types.
  The rules of the production define the attributes of the production, and declare what
  attributes the children have. The production must define its synthesized attributes,
  and the inherited attributes of its children exactly once (with a correct type).
  Attribute references in the expressions may refer to the inherited attributes of the
  production, or the synthesized attributes of the children.
  There is one exception: the inherited attributes of the merged child (e.g. |res|) may not
  be defined, and the synthesized attributes of the merging children (e.g. |left| and |right|)
  may not be referred to.

  We define a translation to Haskell (denotational semantics) that gives both a static and operational
  semantics to SAGs. If the generated Haskell program is type correct then so is the |sagcore| program.
  The execution of the generated Haskell functions shows how the rules are used to construct the
  attributed trees.

  As mentioned in Section~\ref{step.sect.compilestepwise}, we translate a semantics-block to
  an execution plan of the production.
  We use a naming conventional to translate AG names to
  Haskell names. Attributes are referred to by an identifier |cXa|. In this notation, |c| and |a| are the name of the child
  and the name of the attribute respectively. |X| is a subscript |I| for an inherited attribute, and |S| for a synthesized attribute.
  Merging children are prefixed with an underscore (we assume that names of children do not start with an underscore).

  Each rule corresponds to an instruction in the
  execution plan. In the end, we return a tuple with values for the
  synthesized attributes:
    %{
    %format rec = "\mathbf{rec}"
    %format (inp c a) = "{" c "}_{" I "}{" a "}"
    %format (outp c a) = "{" c "}_{" S "}{" a "}"
    %format ins1
    %format insn = ins "_" n
    %format c1
    %format ck = c "_" k
    %format out1
    %format outm = out "_" m
    %format _c1
    %format _ck = _c "_" k
    %format conventional = "\mbox{conventional}"
    %format merged = "\mbox{merged}"

    \begin{code}
    (semb (sem I (many r)))                           ~>  \(inp lhs ins1) ... (inp lhs insn) -> do rec
                                                            { semb(many r); return ((outp lhs out1), ..., (outp lhs outm)) }
    (semb (child c :: I = e)), ^^^  conventional      ~>  (outp c out1, ..., outp c outm) <- e (inp c ins1) ... (inp c insn)
    (semb (child c :: I = e)),      merged            ~>  let _c = e (inp c ins1) ... (inp c insn)

    (semb (p = e))                                    ~>  let (semb p) = (semb e)
    (semb (merge cs as c : N = e))                    ~>  (outp c out1, ..., outp c outm) <- (semb e) _c1 ... _ck
    \end{code}
    %}
    The indices |m| and |n| range over the inherited and synthesized attributes of nonterminal |I|.

    In previous work~\citep{Middelkoop10gpce}, we described different (more sophisticated) translations
    of (Ordered) Attribute Grammars to execution plans (in Haskell). The merge-syntax is fully compatible with
    those translations.

\section{Semantic Lookahead}
\label{step.sect.lookahead}

  In this section, we show how to deal with a choice that has a potential global effect
  on attributes of the tree. For example, suppose that we deal with a type inferencer
  that needs to choose between |int| and |double| for the type of a numerical constant.
  This choice may have a global effect: a wrong choice potentially causes a typing error
  in the remainder of the program to type. To explore such a choice, we want to look
  at the steps of a child \emph{and} the steps that the remaining computation gives if
  we would choose that child.
  In terms of the monad: if |k| is the continuation after the choice, i.e.
  |choose l r >>= k|, then we want to lift the choice to |choose (l >>= k) (r >>= k)|.

  We provide a monadic operation |Ahead| (explained below) to access the continuation |k|. This operations comes
  at a price: since the continuation is not known until runtime, so we wish our
  choose-function to work for arbitrary continuations. In particular, we refrain from
  making static assumptions on the type of the result the continuation computes.
  \begin{code}
  data Stepwise i a where
    Ahead  :: (forall b . (a -> Stepwise i b) -> Stepwise i b) -> Stepwise i a
  \end{code}
  |Ahead| takes as argument a function |f| that takes the continuation |k| as
  argument. To use |Ahead|, we provide this function |f|.
  The continuation takes the result that we are supposed to compute,
  and returns a computation of some existential type |b|. The computation
  passed to ahead thus wraps around the computation: it specifies an input
  for the continuation, and allows us to modify the result of the continuation.
  Thus, this lifted the choice to toplevel, as mentioned in the previous paragraph.
  Also, modifying the result of the continuation instead of building an input
  for the continuation based on trying out the continuation is what makes this
  approach different from Continuation Passing Style's \emph{call/cc}.

  As an example, a computation |m| is equivalent to |Ahead (\k -> m >>= k)|.
  As another example, a global choice can be made using:
  \begin{code}
  Ahead (\k -> choose (l >>= k) (r >>= k))
  \end{code}
  Although we cannot make an assumption about the type of the result of |k|,
  we can make an assumption on the type of progress reports, and thus use
  the contents of the progress reports to direct the exploration between
  the two choices (depending on the search strategy).

  When we encounter an |Ahead f| in |lazyEval|, the question is what continuation
  we pass in. The function |f| is required to get the computation that
  resembles the remaining continuation. However, since we are in |lazyEval|,
  the continuation cannot return any progress reports, and cannot observably
  fail. Thus we simply pass in |Return|, which succeeds immediately, and gives
  us the result that |f| passes to its continuation:
  \begin{code}
  lazyEval (Ahead f) = f Return
  \end{code}

  When we encounter an |Ahead f| in |smallStep| without parents on the
  parent stack, we pause the computation.
  The evaluation can only continue if the caller specifies how the computation
  proceeds (possibly by calling |Ahead| itself):
  \begin{code}
  data Report i a where
    Lookahead :: (forall b . (a -> Stepwise i b) -> Stepwise i b) -> Report i a

  smallStep (Ahead f) = Lookahead f
  \end{code}
  If there are parents on the parent stack, however, we continue to reduce
  |f|. The continuation to pass to |f| are the remaining parents on the
  parent stack:
  \begin{code}
  reduce (Ahead f) (Bind g r) = smallStep $ f $ \a -> Pending (g a) r
  \end{code}

  The use of |Ahead| has an interesting interplay with the hybrid evaluation
  model. On-demand evaluation skips progress reports, and passes a |Return|
  as continuation. If a parent of a child that uses lookahead is evaluated
  on-demand, then the lookahead of the child does not observe the skipped
  progress reports. So, lookahead does not see beyond on-demand evaluated
  AST nodes. So, if a progress report contains information essential to a
  choice using lookahead, we need to take sufficient |smallStep|s at a
  common ancestor node such that the lookahead observed the report.
  For example, we can emit a progress report e.g. |DoneGreedy| when a
  choice using lookahead inspected the progress reports it was interested in,
  take |smallStep|s at the root of the tree (or a common ancestor), until we
  encounter |DoneGreedy|, then switch to |lazyEval|.

  Also, |Ahead| has an interplay with multi-visit AGs (deriveable from
  Ordered Attribute Grammars). Without |Ahead|, stepwise evaluation yields
  a |Done| for a child when the \emph{first} visit is finished. When using
  |Ahead|, however, yields a |Done| when the continuation finished with
  its \emph{last} visit. Again, the progress report mechanism can be used
  to limit the exploration to certain visits, or yield the outcome of a visit
  as intermediate result.

  It is advisable to ensure that all choices can be made based on results
  of the first visit. If it requires a progress report that is emitted in
  a second visit, this requires the first visit to finish completely,
  which is likely already a full exploration of the tree (for that choice).
  It is possible to make the approach more flexible and
  offer just on-demand evaluation for first visits, and (hybrid) stepwise
  evaluation for later visits. That would allow exploration in a later
  visit based on (lazily) computed results in earlier visits.
  This requires visits to be made explicit
  in the AG specification (Chapter~\tref{chapt.side-effect} and Chapter~\tref{chapt.iterative}).

\section{Watchers}
\label{step.sect.watchers}

  With the approach described so far, the type |i| of a progress is fixed:
  given an |m >>= k|, both |m| and the computation returned by |k| must have
  the same |i| type. This limitation affects the compositionality of
  stepwise computations. Also, sometimes we wish to return progress reports
  of the same type as the result of the computation\footnote{
  See Example 7 of @Examples.hs@ in the cabal package at \url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/stepwise-1.0.2.tar.gz}
  }. On the other hand, since the type |i| is fixed, we know that a continuation
  has this type, thus when using |Ahead| (\stepRefAppendix{step.sect.lookahead}{C}) we can
  inspect the progress reports of the continuation. This we cannot do without
  knowing |i|.

  To get the best of both ways, we parametrize the type |i| with a type |w| that
  functions as an index, the \emph{watcher}. A computation has the type
  |Stepwise i w a|, which returns progress reports with values of the type |i w|.
  When |w| is an existential type, e.g. when using |Ahead|,
  we can still scrutinize on all values not dependent on |w|. If |w| is a
  concrete type, we can scrutinize the depending values as well.

  To embed a computation with a different watcher type, we provide a transcoding
  operation:
  \begin{code}
  data Stepwise i w a where
    Transcode  :: (i v -> [i w]) -> Stepwise i v a -> Stepwise i w a
  \end{code}
  It takes a progress report of type |i v| and converts it to zero or more
  progress reports of type |i w|. For example, |smallStep (Transcode (const []) m)|
  does not return any progress reports.
  In the actual implementation, we maintain composed transcoders on the parent stack,
  such that we can immediately apply them without having to traverse the stack.

  In practice, we also allow the transcoding
  function to store a local state, such that it can remember a number of progress
  reports and combine them into a single progress report (compression). If paths
  in the tree are long, and many nodes are inspecting and passing on progress
  reports, then each node gets many reports to process (especially near the root).
  With the transcoding mechanism, we can trade evaluation granularity for the
  number of reports.

%if showExtra
\section{Additional Tree Structure}
\label{step.sect.additionaltreestructure}

  In the example in Section~\ref{step.sect.stepwiseexample}, the children to choose from
  are part of the original AST, and reduced my the merge-rule to a single virtual child.
  This is not always the case, as the example in this section shows.

  Suppose that for a simply-typed lambda calculus, we want to specify two alternative
  productions for an application-expression: one alternative represents strict
  application, the other lazy application. As an optimization, we may choose a strict application
  when all functions that can occur as left-hand side are strict in their respective argument.
  To encode this choice between productions, we need to encode these choices as additional
  children that are not present in the original AST. One option is to
  transform the tree on-demand, prior to attribute evaluation. As alternative and more elegant
  approach, we can use a Higher-order Attribute Grammar and use nonterminal attributes
  (or: \emph{higher-order attributes}). In this section, we demonstrate how.

  We sketch a design pattern for multiple alternative productions by example, using several
  UUAG features. The goal is to be able to define multiple semantics for a production, without
  affecting the original context-free grammar. We start with the context-free grammar of the example:
  \begin{code}
  data Expr
    | ...                                   -- Some lambda calculus.
    | App  ^^^ f :: Expr  ^^^ a :: Expr     -- Conventional application.
  \end{code}
  We concern ourselves only with the |App| production, which has two children, |f| and |a| respectively.

  We introduce additional nonterminals to represent the choices: a nonterminal |AppDispatch| with
  production |Dispatch| that has a child for each alternative production. In particular, the
  semantics of |Dispatch| deals with the choice---\emph{dispatch}---between the alternatives.
  Furthermore, we introduce a nonterminal |AppAlt|, containing the alternative productions. Their
  structure is a clone of the |App| production. Being separate productions, we can give each its own
  distict semantics:
  \begin{code}
  data AppDispatch | Dispatch      -- Introduces two children of nonterm |AppAlt|.
    strict :: AppAlt ^^^ lazy :: AppAlt

  data AppAlt                      -- Productions for the above children.
    | Strict  ^^^ f :: Expr  ^^^ a :: Expr  -- Intended for child |strict|.
    | Lazy    ^^^ f :: Expr  ^^^ a :: Expr  -- Intended for child |lazy|.

  sem AppDispatch | Dispatch       -- Merges the children.
    merge strict lazy as res :: AppAlt = choose_app

  sem AppAlt  ...                  -- give some semantics to |Strict| and |Lazy|.
  \end{code}

  With these additional nonterminals and productions, we encode the semantics of |App|, such
  that it literally encodes the choices as additional tree nodes. Below, we sketch the
  original AST structure for |App|, and the intended structure. The nodes are displayed as
  a circle. The nonterminal corresponding to a node is displayed above it, the production to the
  left, and the name to the right. The merge is displayed as a square. The dotted arrows represent
  a transfer of the semantics of a node to a different location in the tree.

  \begin{center}
  \begin{tikzpicture}
    [ nd/.style={circle, minimum size=3mm, thick, draw=black,font=\footnotesize}
    , ndmerge/.style={rectangle, minimum size=3mm, thick, draw=black,font=\footnotesize}
    , nddash/.style={nd,dashed}
    , ndlab/.style={font=\footnotesize}
    , arr/.style={->,dashed, very thick}
    ]

    \node[nd](app){}
      [sibling distance=24mm, level distance=8mm]
      child {
        node[nd](origf){}
      }
      child {
        node[nd](origa){}
      }
      child {
        node[nd](d){}
          child {
            node[ndmerge](res){}
              [sibling distance=40mm, level distance=8mm]
              child {
                node[nd](strict){}
                  [sibling distance=20mm, level distance=8mm]
                  child {
                    node[nd](f1){}
                  }
                  child {
                    node[nd](a1){}
                  }
              }
              child {
                node[nd](lazy){}
                  [sibling distance=20mm, level distance=8mm]
                  child {
                    node[nd](f2){}
                  }
                  child {
                    node[nd](a2){}
                  }
              }
          }
      };

  \node[nd, above of=app, yshift=10mm](app2){}
    [sibling distance=20mm, level distance=8mm]
    child {
      node[nd](f){}
    }
    child {
      node[nd](a){}
    };
  \node[ndlab, above of=app2, yshift=-7mm]{|Expr|};
  \node[ndlab, above of=f, yshift=-7mm]{|Expr|};
  \node[ndlab, above of=a, yshift=-7mm]{|Expr|};
  \node[ndlab, left of=app2, xshift=4mm]{|App|};
  \node[ndlab, right of=f, xshift=-6mm]{|f|};
  \node[ndlab, right of=a, xshift=-6mm]{|a|};

  %% Dotted arrow
  \draw[->,dotted](origa) to (a1);
  \draw[->,dotted](origa) to (a2);
  \draw[->,dotted](origf) to (f1);
  \draw[->,dotted](origf) to (f2);

  %% Nonterminal annotations (top)
  \node[ndlab, above of=app, yshift=-7mm]{|Expr|};
  \node[ndlab, above of=origf, yshift=-7mm]{|Remap|};
  \node[ndlab, above of=origa, yshift=-7mm]{|Remap|};
  \node[ndlab, above of=d, yshift=-7mm]{|AppDispatch|};
  \node[ndlab, above of=res, yshift=-7mm]{|AppAlt|};
  \node[ndlab, above of=strict, yshift=-7mm]{|AppAlt|};
  \node[ndlab, above of=lazy, yshift=-7mm]{|AppAlt|};
  \node[ndlab, above of=f1, yshift=-7mm]{|Expr|};
  \node[ndlab, above of=f2, yshift=-7mm]{|Expr|};
  \node[ndlab, above of=a1, yshift=-7mm]{|Expr|};
  \node[ndlab, above of=a2, yshift=-7mm]{|Expr|};

  %% Child names (right)
  \node[ndlab, right of=origf, xshift=-6mm]{|f|};
  \node[ndlab, right of=origa, xshift=-6mm]{|a|};
  \node[ndlab, right of=d, xshift=-6mm]{|d|};
  \node[ndlab, right of=res, xshift=-2mm]{merge |res|};
  \node[ndlab, right of=strict, xshift=-4mm]{|strict|};
  \node[ndlab, right of=lazy, xshift=-4mm]{|lazy|};
  \node[ndlab, right of=f1, xshift=-6mm]{|f|};
  \node[ndlab, right of=a1, xshift=-6mm]{|a|};
  \node[ndlab, right of=f2, xshift=-6mm]{|f|};
  \node[ndlab, right of=a2, xshift=-6mm]{|a|};

  %% Productions (left)
  \node[ndlab, left of=app, xshift=4mm]{|App|};
  \node[ndlab, left of=origf, xshift=3mm]{|Remap|};
  \node[ndlab, left of=origa, xshift=3mm]{|Remap|};
  \node[ndlab, left of=d, xshift=2mm]{|Dispatch|};
  \node[ndlab, left of=strict, xshift=4mm]{|Strict|};
  \node[ndlab, left of=lazy, xshift=4mm]{|Lazy|};

  \node[right of=app2, xshift=22mm](anchor1){};
  \node[right of=d, xshift=2mm](anchor2){};
  \node[font=\footnotesize, right of=app, xshift=55mm, yshift=8mm]{\begin{minipage}{12em}\noindent{}We encode the semantics of |App|, such that it resembles the structure below.\end{minipage}};
  \draw[->,very thick] (anchor1) to [out=-15,in=15] (anchor2);
  \end{tikzpicture}
  \end{center}

  \noindent{}To establish the the intended structure, we need to accomplish two tasks.
  Since we deal with the actual semantics of |App| in the productions |Strict| and |Lazy|, it
  is actually inconvenient to have |f| and |a| as children of |App|: we wish to take these
  children away, and attach them instead below the |strict| and |lazy| nodes.
  Furthermore, we wish to construct a child |d| that dispatches to |strict| or |lazy|.

  For the first task, we \emph{replace} the semantics of |f| and |a| (nonterminal |Expr|)
  with a nonterminal |Remap I_Expr| that provides the original semantics as a synthesized
  attribute, and is described by:
  \begin{code}
  data Remap a | Remap ^^^ s :: a    -- Terminal |s| stores semantics of type |a|.
  attr Remap ^^^ syn s :: a          -- Attribute |s| provides it to parent.
  sem Remap | Remap  ^^^  lhs.s = s  -- Rule for |s|
  \end{code}
  To replace |f| and |a|'s semantics, we declare two nonterminal attributes |f| and |a|
  for |App| that clash with the names of the original children |f| and |a|. The value
  for the nonterminal attribute must be a \emph{semantics transformer}: a function that
  takes the original semantics (|I_Expr|) and provides the transformed semantics
  (|Remap I_Expr|), denoted as follows\footnote{
  UUAG's actual syntax for higher-order attributes differs slightly:
  \begin{code}
  sem Expr |App
    inst.f :: Remap {T_Expr}                -- type signature
    inst.f = \origSem -> sem_Remap origSem  -- conventional rule
  \end{code}
  }:
  \begin{code}
  sem Expr | App
    child f :: Remap  = \origSem -> sem_Remap origSem  -- Transforms |f|.
    child a :: Remap  = \origSem -> sem_Remap origSem  -- Transforms |a|.
  \end{code}
  The original semantics of |f| and |a| is passed as argument, thus we use it to store it
  as terminal in the |Remap| node. Since |f| and |a| effectively now are of nonterminal
  |Remap|, it means we can use attributes |f.s| and |a.s| to obtain the original semantics
  of |f| and |a| and use it to construct child |d|.

  For the second task, we construct |d|, again using a nonterminal attribute\footnote{
    For nonterminal attribute |d|, we do not get the original semantics of |d| as
    parameter by absence of a conventional child |d|. The syntax for a nonterminal
    attribute thus adds or replaces depending on the existence of a conventional child.
    Also, there may not be duplicate nonterminal attributes to prevent ordering issues.
  }:
  \begin{code}
  sem Expr |App
    child d :: AppDispatch  = Dispatch (Strict f.s a.s) (Lazy f.s a.s)
  \end{code}
  The right-hand side is a proper semantics for |AppDispatch|, and describes the
  tree for |d| as we visualized above.

  Via these nonterminal attributes, we managed to tweak the tree structure, such that
  it conveniently encodes the choices that need to be made.
  The additional nodes still lack attributes: they need at least the same attributes of
  |Expr|, and perhaps more if we want to.
  To prevent having to write the attributes of |Expr| twice, we use UUAG's nonterminal sets
  to define for example an inherited environment attribute:
  \begin{code}
  set AllExpr = Expr AppDispatch AppAlt  -- Defines a set of nonterminals
  attr AllExpr  ^^^  inh env :: Env      -- Declares |env| for the three nonterms.
  \end{code}
  There is no need to define rules for the attributes of |AppDispatch| and |App|: these
  are automatically provided by the copy-rule mechanism.
  Inherited and synthesized are respectively passed topdown or bottom up when no
  explicit rules are specified.

  In summary, we can easily encode a choice between productions as a choice between children.
  In particular, using UUAG's higher-order extensions and copy rules,
  this transformation has a very concise and orthogonal implementation.
%endif

%if False
\section{Applicative Functors and Monads}
\label{step.sect.functorsmonads}

\begin{code}
data Tree
  | Leaf
  | Bin  left,right : Tree

attr Tree ^^^ inh depth :: Int ^^^ syn height :: Int
sem Tree | Leaf
  lhs.height   = 1
sem Tree | Bin
  left.depth   = 1 + lhs.depth
  right.depth  = 1 + lhs.depth
\end{code}

\begin{code}
pLeaf = pure sem_Leaf
pBin l r = sem_Bin <$> l <*> r
\end{code}
%endif

%if showExtra
\section{Inference Rules and AGs}
\label{step.sect.inferencerulesandags}

  %{
  %format e1
  %format e2
  %format e3
  %format e4
  %format o1
  %format o2
  %format o3
  %format prf1
  %format prf2
  %format Gamma = "\Gamma{}"
  %format :- = "\vdash{}"

  In this section we show an example that is closer related to type systems compared to
  Section~\ref{step.sect.stepwiseexample}. Consequently, it is also more complicated.
  The example is an equality inferencer, implemented with AGs.
  Given two objects (|o1| and |o2| respectively), and a set of equality assumptions |Gamma|,
  we wish to obtain a derivation |Gamma :- o1 == o2| using the following conventional
  reflection, symmetry and transitivity inference rules:
  \begin{mathpar}
    \inferrule*[right=refl]
      {}
      {| Gamma :- o == o |}

    \inferrule*[right=sym]
      {| Gamma :- o2 == o1|}
      {| Gamma :- o1 == o2|}

    \inferrule*[right=trans]
     {| Gamma :- o1 == o2|\\\\
      | Gamma :- o2 == o3|
     }
     {| Gamma :- o1 == o3|}

    \inferrule*[right=assum]
      {| (o1,o2) `elem` Gamma|}
      {| Gamma :- o1 == o2|}
  \end{mathpar}
  Clearly, these rules are not syntax directed: a derivation with these rules depends on both
  objects. The rules are ambiguous. For example, we can apply \TirName{sym} twice and end up
  where we started. To disambiguate, we search for a derivation with minimal depth, in a fixed
  but unimportant order. Also, in rule \TirName{trans}, |o2| is not
  known a priori and needs to be guessed. To simplify the example, we assume
  that the objects are finitely orderable, such that simply try values for |o2|, instead of having
  to resort to variables and unification.

  We start with a grammar |E| for equality derivations, and attributes that represent the parameters
  of the equality relation. As output, we define an attribute |pp| that is a string representation
  of the derivation:
  \begin{code}
  data E | Any | Refl | Sym | Trans | Assum
  attr E ^^^ inh l, r :: Obj env :: Env ^^^ syn pp :: String

  type Env = Set (Obj,Obj)
  \end{code}
  The productions are all empty. We define per production nonterminals for each premise, using
  higher-order attributes. For example, the production |Any| makes a choice between any of the
  other productions:
  \begin{code}
  sem E | Any
    child refl   : E = sem_Refl
    child sym    : E = sem_Sym
    child trans  : E = sem_Trans
    child assum  : E = sem_Assum
    merge refl sym trans assum as prf : E = \e1 e2 e3 e4 ->
      memo (lhs.l,lhs.r) $ info Work >> one localChoice [e1,e2,e3,e4]
    lhs.pp = prf.pp

  one = foldl f (fail "")
  \end{code}
  A child-rule introduces a nonterminal where the semantics is defined by the RHS of the child rule
  (\stepRefAppendix{step.sect.additionaltreestructure}{E}).
  We assume that if a rule for an inherited attribute of a child is omitted, the value is taken from an
  equally named attribute of |lhs| instead. Note that |prf| does not have any inherited attributes, because
  it's a merged child.

  One of the four alternative children is selected by |one|. It takes the first child to succeed, or
  the last one to fail (left biased) and calls it |prf|. By adding a progress report in front of the
  selected choice, the number of progress reports stands for the depth of the derivation.
  Later we see a variant of |localChoice|, which can look
  ahead beyond the child to see if evaluation after we pick a certain candidate succeeds.

  Instead of a derivation tree, we build a derivation graph via memorization. It remembers the outcome
  using as key |(lhs.l,lhs.r)|. If it encounters the same key while actually defining itself, it causes
  a backtrack (prevents cycles).
  Also, lookaheads cannot see beyond |memo|.

  To implement \TirName{sym}, we search for a derivation with the inputs swapped:
  \begin{code}
  sem E | Sym
    child prf : E = sem_Any
    prf.l   = lhs.r
    prf.r   = lhs.l
    lhs.pp  = "sym" ++ prf.pp
  \end{code}

  To implement \TirName{refl}, we need to check that the left and right sides are equal. Similarly,
  for \TirName{assum}, we check that the equality occurs as assumption in the environment.
  For that,
  we introduce an additional nonterminal, |C|. It takes a Boolean value as inherited attribute. If this
  value is |True|, if defines the |pp| attribute, otherwise it fails the evaluation. We encode this
  behavior using a merge with an empty set of children:
  \begin{code}
  data C | Check
  attr C ^^^ inh guard :: Bool ^^^ syn pp :: Doc

  sem C | Check
    merge as res : C = if lhs.guard then pp "check" else fail "guard"
    lhs.pp = res.pp

  sem E | Refl
    child c : C = sem_Check
    c.guard  = lhs.l == lhs.r
    lhs.pp   = "refl" ++ c.pp

  sem E | Assum
    child c : C = sem_Check
    c.guard  = (lhs.l,lhs.r) `elem` lhs.env
    lhs.pp   = "assum" ++ c.pp
  \end{code}

  Finally, \TirName{trans}. We use a nonterminal |G| to guess the intermediate term.
  For that, we enumerate objects, and select one of them, using \emph{lookahead} via
  |globalChoice|. This lookahead is essential, because we need to pick an intermediate
  term that works with both premises of \TirName{trans}:
  \begin{code}
  data G | Guess
  attr G ^^^ syn obj :: Obj

  sem G | Guess
    merge as res : G = one globalChoice $ map return enumerate
    lhs.obj = res.obj

  sem E | Trans
    child g     : G  = sem_Guess
    child prf1  : E  = sem_Any
    child prf2  : E  = sem_Any

    prf1.l  = lhs.l
    prf1.r  = g.obj
    prf2.l  = g.obj
    prf2.r  = lhs.r

    lhs.pp  = "trans" ++ prf1.pp ++ prf2.pp
  \end{code}

  The inference rules look rather innocent, but an inferencer for them is far from trivial.
  With our AG extension, we can now relatively easily deal with such inference rules.
  %}
%endif

%if showExtra
\section{Mutual Expressibility of Declarative Aspects in Rules and Productions}
\label{step.sect.mutuallyexpressible}

  We hinted in Section~\ref{step.sect.introduction} that declarative aspects in
  rules and productions are mutually expressible, and can both be encoded as
  a choice between children. We showed the latter for productions in
  Appendix~\ref{step.sect.additionaltreestructure}. For declarative aspects in rules
  we actually need to qualify our statement: it works for finite and
  inductively defined data types\footnote{
    Technically speaking, the statement holds for any value represented by
    a finite number of bits (which is the case for any value in practice).
  }. We give an example in this section.

  Suppose that types are inductively defined as follows:
  \begin{code}
  data Ty = Int | Arrow Ty Ty
  \end{code}
  In an AG for a type inferencer, we wish to define a semantics for the
  lambda production that infers the type of the argument.
  When we assume that we can obtain somehow a nondeterministic value |fresh|, we could write
  the semantics for a lambda expression as follows:
  \begin{code}
  attr Expr  ^^^ inh env :: [(String, Ty)]  ^^^ syn ty :: Ty
  sem Expr | Lam
    loc.argTy  = fresh
    body.env   = x : loc.argTy, lhs.env
    lhs.ty     = Arrow loc.argTy body.ty
  \end{code}
  We nondeterministically get a type for the argument, and use this in the environment for the
  body of the lambda, and for its type. Note that the definition for |loc.argTy| clashes with
  referential transparency: |fresh| is not associated with a single unique value. Also note that
  there may be infinitely many values possible for |argTy|: for predictability reasons, we want
  to make explicit what value is chosen here.

  We can encode the above declarative aspect in a functional way: we define a tree that represents
  all possible values of type |Ty|, and use the choice function |globalChoice| (Appendix~\ref{step.sect.inferencerulesandags})
  to choose to smallest |Ty| that results in a successful inference.

  First, we introduce a nonterminal attribute of nonterminal |FreshDispatch| (defined later)
  that provides us with a synthesized attribute |ty| that holds the fresh type:
  \begin{code}
  sem Expr | Lam
    child fr :: FreshDispatch  = mkFresh
    loc.argTy  = fr.ty
  \end{code}

  Then, |mkFresh| builds such an infinite tree of fresh possibilities for a type.
  We follow the pattern described in \stepRefAppendix{step.sect.additionaltreestructure}{E}:
  |FreshDispatch| corresponds to a dispatcher node that has a child
  (respectively, |int| and |arrow|) for each alternative type:
  \begin{code}
  mkFresh = FreshDispatch FreshInt (FreshArrow mkFresh mkFresh)

  data FreshDispatch | Dispatch  ^^^ int :: FreshAlt  ^^^ arrow :: FreshAlt

  data FreshAlt
    | FreshInt
    | FreshArrow  ^^^  arg :: FreshDispatch  ^^^  res :: FreshDispatch

  attr FreshDispatch FreshAlt ^^^ syn ty :: Ty
  \end{code}

  The semantics of |Dispatch| defines that (with a preference for |int|) the child
  that makes the evaluation succeed (for a notion of success defined by the
  other code of the inferencer) is chosen:
  \begin{code}
  sem FreshDispatch | Dispatch
    merge int arrow as res = globalChoice

  sem FreshAlt
    | FreshInt    lhs.ty = Int
    | FreshArrow  lhs.ty = Arrow arg.ty res.ty
  \end{code}

  These declarative forms are thus indeed mutually expressible.
  In practice, however, we typically deal with declaratively defined attributes in
  a different way: we use a unification-based approach together with an additional
  substitution attribute, which does not need an exploration of alternatives. While we can encode
  the declarative aspect via such an exploration, there may be more efficient ways to resolve it.
%endif

%if showExtra
\section{Proofs}
\label{step.sect.proofs}

  In this section, we present proofs of some important properties of the stepwise monad.
  For simplicity, we take the implementation of Section~\ref{step.sect.stepwisemonad}, without
  the extensions mentioned in Section~\ref{step.sect.evaluation}.
  We prove that the monad laws hold. This is important, because the desugaring of monad notation
  makes use of them. Also, it is a sanity check that gives additional confidence in the correctness of
  the implementation. We need to prove that:

  \begin{tabular}{ll}
    Left identity:  &  |return a >>= f  ==  f a| \\
    Right identity: &  |m >>= return  ==  m| \\
    Associativity:  &  |(m >>= f) >>= g  ==  m >>= (\x -> f x >>= g)| \\
  \end{tabular}

  \noindent{}In our case, we have to prove the monad laws twice, because we have two
  interpretations of the monad: |lazyEval| and |smallStep|.

  \subsection{Monad Laws for Lazy Evaluation}

  These proofs are straightforward equivalence proofs that exploit Haskell's referential transparency
  to reduce the LHS and RHS to a common term:

  \begin{code}
  lazyEval (return a >>= k)
       ==  lazyEval (Pending (Return a) (Bind k None))
       ==  evalPending (Bind k None) (lazyEval (Return a))
       ==  evalPending None (lazyEval (k a))
       ==  lazyEval (k a)

  lazyEval (m >>= return)
       ==  lazyEval (Pending m (Bind Return None))
       ==  evalPending (Bind Return None) (lazyEval m)
       ==  evalPending None $ lazyEval $ Return $ lazyEval m
       ==  lazyEval m
  \end{code}

  In case of the associativity law, we reduce both sides to
  a chain of |lazyEval| calls. This is the intuition behind
  |evalPending|: for each parent on the pending stack, it
  passes to the parent the |lazyEval| of its child:
  \begin{code}
  lazyEval (m >>= (\x -> k x >>= h))
       ==  lazyEval (Pending m (Bind (\x -> Pending (k x) (Bind h None)) None))
       ==  evalPending (Bind (\x -> Pending (k x) (Bind h None)) None) (lazyEval m)
       ==  lazyEval $ (\x -> Pending (k x) (Bind h None)) $ lazyEval m
       ==  lazyEval $ Pending (k (lazyEval m)) (Bind h None)

       ==  lazyEval $ h $ lazyEval $ k $ lazyEval m

       ==  lazyEval $ h $ evalPending (Bind k None) $ lazyEval m
       ==  lazyEval $ h $ lazyEval (Pending m (Bind k None))
       ==  evalPending (Bind h None) (lazyEval (Pending m (Bind k None)))
       ==  lazyEval (Pending (Pending m (Bind k None)) (Bind h None))
       ==  lazyEval ((m >>= k) >>= h)
  \end{code}

  \subsection{Monad Laws for Stepwise Evaluation}

  The proof for left-identity is still a straightforward equivalence proof:
  \begin{code}
  smallStep (return a >>= k)
       ==  smallStep (Pending (Return a) (Bind k None))
       ==  reduce (Return a) (Bind k None)
       ==  reduce (k a) None
       ==  smallStep (k a)
  \end{code}

  The remaining proofs are more complicated. We quickly run into the problem that
  we cannot rewrite any further because we do not know some of the variables, e.g. |m|:
  \begin{code}
  smallStep (m >>= return)
       ==  reduce m (Bind Return None)
       ...
       ==  smallStep m
  \end{code}

  To proceed, we use structural induction and case distinction on all possible values of |m|,
  which means that we use induction over the sequence of progress reports yielded by |m|.
  First the inductive case:
  \begin{code}
    case m of
      Yield i n  ->     reduce (Yield i n) (Bind Return None)
                            ==  Step i (Pending n (Bind Return None))
                            ==  Step i (n >>= return)   -- induction hypothesis
                            ==  Step i n
                            ==  smallStep (Yield i n)
  \end{code}

  In contrast to many inductive proofs, actually the base case is the interesting part.
  We finish off most cases via trivial equality rewrites. Unfortunately, when |m| is
  a |Pending|-value, we are again stuck:
  \begin{code}
    case m of
      Yield i n    ->    undefined   -- Falsum: no progress report
      Fail s       ->    reduce (Fail s) (Bind Return None)
                             ==  Failed s
                             ==  smallStep (Fail s)
      Return v     ->    reduce (Return v) (Bind Return None)
                             == reduce (Return v) None
                             == smallStep (Return v)
      Pending n p  ->    reduce (Pending n p) (Bind Return None)
                             == reduce n (push p (Bind Return None))
                             ...
                             == reduce n p
                             == smallStep (Pending n p)
  \end{code}

  We proceed with structural induction on |n| and |p|, such that we may continue to unfold
  |reduce|. The cases for |Yield|, |Fail| and |Pending| for |n| are similar to cases above.
  Of interest are the two cases with |None| and |Bind| for |p|.
  We may apply this induction hypothesis, as well as the earlier one above:
  \begin{code}
    case p of
      None        ->  reduce n (push None (Bind Return None))
                          ==  reduce n (Bind Return None)  -- I.H (above)
                          ==  smallStep n
                          ==  reduce n None
      Bind f q   ->   reduce n (push (Bind f q) (Bind Return None))
                          ==  reduce n (Bind f (push q (Bind Return None)))  -- I.H.
                          ==  reduce n (Bind f q)
  \end{code}

  The proof for the associativity law proceeds in a similar way.

\section{Benchmarks}
\label{step.sect.benchmarks}

  Regarding asymptotic complexity, it is easy to show that in the worst case the performance
  is exponential in the number of choice points. For small inputs, this may not be
  a problem, and in general it is up to the programmer to keep alternatives to a limit.
  A remaining question, however, is how much overhead the stepwise evaluation induces.
  This overhead should be constant, and we experimentally validated that this is indeed the case.

  For this purpose, we compared the evaluation of a sequences of binds, using both the stepwise monad
  and the identity monad\footnote{
    See \url{https://svn.science.uu.nl/repos/project.ruler.papers/archive/StepBenchmarks.hs}.
  }. Benchmark @id@ uses the identity monad, @lazy@ uses |lazyEval|, and @step@ uses full stepwise
  evaluation. We tested both nested binds, and long
  sequences of binds, via the following benchmark:
  \begin{code}
  bindbench length depth = tree depth where
    runner 0 n  = n `seq` return ()
    runner m n  = tree n >>= \() -> runner (m-1) n

    tree 0  = return ()
    tree n  = runner length (n-1)
  \end{code}
  We ran the benchmarks on a standard MacBook 2.1 (2 GB RAM, 2 GHz Core 2 Duo) with GHC 6.12.1 and
  GCC 4.2.1. The outcome was measured via the @criterion@ package (version 0.5.05). It executed
  the benchmarks hundreds of times. We used @+RTS -H512M@ as commandline parameter to minimize the
  effect of garbage collection. The variance of the measured results was less than a percent.

  \vspace{0.3em}
  \begin{tabular}{rrrrr}
    depth & length     & id (ms) & lazy (ms) & step (ms)  \\
    1     & 10,000     & 0.0555 & 1.40 & 1.42   \\
    1     & 100,000    & 0.554  & 20.2 & 14.4 \\
    1     & 1,000,000  & 5.56   & 259  & 144  \\
    5     & 2          & 0.000540 & 0.0102 & 0.0137 \\
    7     & 2          & 0.00230 & 0.0415 & 0.0564 \\
    9     & 2          & 0.00905 & 0.166 & 0.231 \\
    5     & 3          & 0.00214 & 0.0585 & 0.0809 \\
    3     & 50         & 0.712 & 18.2 & 26.0 \\
    4     & 10         & 0.0680 & 1.63 & 2.35 \\
  \end{tabular}
  \vspace{0.3em}

  \noindent{}Indeed, both the evaluation via the identity monad and full stepwise evaluation
  show a constant amount of overhead per bind, with about a factor 30 throughput difference.
  Lazy evaluation shows a more erratic behavior.
  With an earlier and simpler version of our library -- essentially the implementation of
  Section~\ref{step.sect.stepwisemonad} -- we measured a factor 10 throughput
  difference. The additional complexity discussed in Section~\ref{step.sect.evaluation} thus
  has its cost. However, this benchmark does not show us what to expect in a real
  application.

  To measure actual performance in practice, we compiled UHC (repository version 2226, 04 dec 2010) with both the conventional
  AG translation (using the @--optimize@ flag) and the stepwise AG translation. For the
  latter, we added the @Control.Monad.Stepwise.AG@ module to the export list of UHC's @Common.chs@.
  Furthermore, we added the commandline option @--with-uuagc-options="--breadthfirst"@ to the call
  to UHC's @configure@.
  Since the AGs of UHC are orderable, we
  can fully evaluate them stepwise, without requiring |lazyEval|. Three insignificant
  AGs were not orderable: for these we used the conventional translation.
  We believe that the UHC is a good benchmark to validate AG performance in practice:
  it uses hundreds of AGs for its implementation, such as AGs that work on large data structures (the ASTs of
  various Haskell programs), but also on thousands on small data structures (for example,
  to compute the free variables of a type). The AGs also vary on the number of productions,
  and the number of rules.

  We measured the time that it took for UHC to compile programs from the @nofib@ benchmark
  suite. Additionally, we verified that the outcome and performance of the
  compiled Haskell programs are not affected by the difference in implementation of UHC.
  The outcome of these benchmarks, combined with the proofs of Section~\ref{step.sect.proofs}
  and the strong typing discipline imposed on the monad, gives us confidence in the correctness
  of the implementation.

  \vspace{0.3em}
  \begin{tabular}{lrrrr}
    benchmark (name)  &  conventional (sec.) & stepwise (sec.) \\
    \hline
    @imag/bernouilli@   & 4.51 & 4.95 \\
    @imag/binarytrees@  & 4.35 & 4.40 \\
    @imag/digits-of-e1@ & 4.14 & 4.16 \\
    @imag/digits-of-e2@ & 3.85 & 3.88 \\
    @imag/exp3_8@       & 4.33 & 4.29 \\
    @imag/gen-regexps@  & 4.26 & 4.38 \\
    @imag/integrate@    & 4.29 & 4.34 \\
    @imag/loop@         & 3.76 & 3.81 \\
    @imag/nsieve@       & 1.11 & 1.10 \\
    @imag/paraffins@    & 5.55 & 5.62 \\
    @imag/partial-sums@ & 4.40 & 4.48 \\
    @imag/pidigits@     & 4.60 & 4.29 \\
    @imag/primes@       & 3.77 & 3.72 \\
    @imag/queens@       & 3.74 & 3.81 \\
    @imag/recursive@    & 4.37 & 4.37 \\
    @real/infer@        & 11.9 & 11.8 \\
  \end{tabular}
  \vspace{0.3em}

  \noindent{}The difference in performance is very small. On average, the performance using
  stepwise computations is marginally worse.
  However, with such small differences, the garbage collector has more impact on performance, as
  the cases for @imag/nsieve@ and @real/infer@ seem to suggest. With these results, we dare to say that
  our implementation is ready to be used in practice.
%endif

%if False
\section{Optimization}
\label{step.sect.optimization}

  The monadic combinators of Section~\ref{step.sect.stepwisemonad} actually builds a representation of the
  monadic computation. Subsequently, we interpret this representation via |lazyEval| or |smallStep|.

  \begin{code}
  data Input a b where
    ...
  \end{code}
%endif

%if showExtra
\section{Java Implementation}
\label{step.sect.java}

In Section~\ref{step.sect.imperative}, we gave a short overview of an imperative implementation of
the work as described in \thiswork{}.
In this section, we work out the example of Section~\ref{step.sect.stepwiseexample} using Java instead of Haskell.
We derive the code for the example in a systematic way.
We thus describe implicitly how to translate AGs to Java. In comparison to
AG systems such as JastAdd~\citep{DBLP:conf/oopsla/EkmanH07}, our approach offers lazy and strict
evaluation, higher-order attributes, and the stepwise features of \thiswork{}.
Furthermore, there is a clear correspondence between the Java implementation and the reference implementation
in Haskell.

%% The AST
As conventional in object-oriented languages with class-based inheritance, we introduce an abstract
class for each nonterminal and a subclass for each production. Nodes of the AST are instances of
these subclasses. A child of a production has the abstract class as its static type, and one of the
subclasses as its runtime type. The abstract class contains the inherited and synthesized attributes
of the nonterminal as fields. A subclass contains the rules of the production (and possibly local attributes).
For the example, we thus introduce a abstract class |Pred|, and a subclass for each production:
%% \begin{lstlisting}[name=stepastoverview]
\begin{small}\begin{verbatim}
public final class PredConst extends Pred    /* implemented later */
public final class PredVar   extends Pred    /* implemented later */
public final class PredLet   extends Pred    /* implemented later */
public final class PredOr    extends Pred    /* implemented later */
public final class PredAnd   extends Pred    /* implemented later */
\end{verbatim}\end{small}
%% \end{lstlisting}
The constructors of the subclasses take values for the symbols of the associated production as
parameter. For example, we can construct the predicate |p3| that represents `|x| or |false|':
%% \begin{lstlisting}[name=stepastoverview]
\begin{small}\begin{verbatim}
Pred p1 = new PredVar("x");
Pred p2 = new PredConst(false);
Pred p3 = new PredOr(p1, p2);
\end{verbatim}\end{small}
%% \end{lstlisting}
We show later how to pass values for inherited attributes to such a node, and how to
extract values for synthesized attributes. We first show how attributes, rules and
attribute evaluation are represented.

The abstract class |Pred| contains inherited and synthesized attributes in the
respective fields |_inhs| of type |PredInh| and |_syns| of type |PredSyn|.
The inherited attributes are publicly accessible via the getter method |inhs|.
The synthesized attributes may only be accessed directly by the node itself.
We show later that a parent node uses the method |lazyEval| on a child
(inherited from the class |Node|) to access the child's synthesized
attributes.
The |Node| type is parameterized with tree types: the type of the object
containing the synthesized attributes, the type of the information messages that
may be yielded during strict evaluation, and the types of failures it may generate.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public abstract class Pred extends Node<PredSyn, Info, BacktrackException> {
  private PredInh _inhs;
  private PredSyn _syns;

  public Pred() {
    _inhs = new PredInh();
    _syns = new PredSyn(); }

  protected PredSyn syns() {
    return _syns; }

  public PredInh inhs() {
    return _inhs; }}

public class PredInh {
  private Attr<HashMap<String, Boolean>> _env;

  public PredInh() {
    _env = new Attr<HashMap<String, Boolean>>(); }

  public Attr<HashMap<String, Boolean>> env() {
    return _env; }}

public class PredSyn {
  private Attr<Boolean> _value;

  public PredSyn() {
    _value = new Attr<Boolean>(); }

  public Attr<Boolean> value() {
    return _value; }}
\end{verbatim}\end{small}
%% \end{lstlisting}
Attributes are parameterized by the type of values that they store. Initially, an
attribute is initialized to |null|. An attribute can be assigned a value in two ways.
Either a value can be explicitly set via the attribute's |set| method, or a rule
can be associated via the method |dependsOn|. In the latter case, when the value of
the attribute is requested via the |get| method and the attribute is not yet defined,
the associated rule is run. The rule calls the |set| method to assign a value to
the attribute. Therefore, we require rules to be encoded as objects, such that they
can be passed around as first-class citizens.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public final class Attr<T> {
  private T _value;
  private Runnable _rule;

  public Attr() {
    _value = null;
    _rule = null; }

  public T get() {
    if (_value == null && _rule != null)
      _rule.run();

    return _value; }

  public void set(final T value) {
    _value = value; }

  public void dependsOn(final Runnable rule) {
    _rule = rule; }}
\end{verbatim}\end{small}
%% \end{lstlisting}
The |Node| class from which all AST nodes inherit exposes the synthesized
attributes only to subclasses of |Node|. The intended usage protocol is that after
creation of a child node, its parent first assigns rules to the node's inherited
attributes. Subsequently, the parent may issue a call to the |begin| method
to obtain a stepwise computation. A stepwise computation of a child represents
the computations for the subtree rooted by that child (captured by the class |Parents|
that we discuss later).
A stepwise computation
can be manipulated in two ways. It can be lazily evaluated via
|lazyEval|, such that it returns the node's synthesized attributes, or it
can perform one evaluation step, and return a progress report (the class |Report|
that we discuss later).
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public abstract class Node<X, I, E> extends CoroutineBase<I, E> {
  public Node() {}

  abstract protected X syns();

  public Stepwise<X, I, E> begin() {
    return new Parents<X, I, E>(this, syns()); }}

public abstract class CoroutineBase<I, E> implements Coroutine<I, E> {
  /* implemented later */ }

public interface Stepwise<X,I,E> extends Coroutine<I,E> {
  X lazyEval(); }

public interface Coroutine<I,E> {
  Report<I,E> nextStep(); }
\end{verbatim}\end{small}
%% \end{lstlisting}
Both the node and the stepwise computations are coroutines. We can invoke
a coroutine such that it runs until it yields a report. The stepwise
computations orchestrate |nextStep| calls on the nodes, as we see later.

Along similar lines as the protocol that we described above, we construct
the AST, assign values for the root's inherited attributes, use |begin|
on the root to get its stepwise computation, then finally invoke
|lazyEval| to get the synthesized attributes.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public final class Main {
  public static void main(final String[] args) {
    // build tree
    Pred p1 = new PredVar("x");
    Pred p2 = new PredVar("y");
    Pred p3 = new PredOr(p1, p2);

    // set inherited attributes of the root
    p3.inhs().env().set(new HashMap<String, Boolean>());
    p3.inhs().env().get().put("x", false);
    p3.inhs().env().get().put("y", true);

    // start on-demand evaluation
    Stepwise<PredSyn, Info, BacktrackException> outcome = p3.begin();
    boolean result = outcome.lazyEval().value().get();
    System.out.println("result: " + result); }}
\end{verbatim}\end{small}
%% \end{lstlisting}

Since Java does not have native support for coroutines, we encode it as an object
that has a |nextStep| method that can be invoked multiple times.
During each invocation it may perform some computations and yield a progress report. It may
keep track of its local state via private fields of the encapsulating object.

To ease the implementation of coroutines, the |CoroutineBase| class provides boilerplate
code.
Such a coroutine must implement the method |visit|. This method encodes a
sequence of computations indexed by a parameter |_state|. This parameter
is incremented after each invocation, such that the |visit| method can invoke
the next sequence of computations.
The visit method does not directly return a progress report, but may call API functions to
enqueue one or more progress reports. The |nextStep| method invokes the
|visit| method until there is at least one element in the queue, and subsequently
returns this report.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public abstract class CoroutineBase<I, E> implements Coroutine<I, E> {
  private LinkedList<Report<I, E>> _actions;
  private int _state;

  public CoroutineBase() {
      _state = 0;
    _actions = new LinkedList<Report<I, E>>(); }

  public Report<I, E> nextStep() {
    Report<I, E> rep = null;

    while (true) {
      rep = _actions.poll();
      if (rep == null) {
        visit(_state);
        ++_state; }
      else
        return rep; }}

  protected abstract void visit(final int state);

  protected void emit(final I info) {
    _actions.add(new ReportInfo<I, E>(info)); }

  protected void resumeAfter(final Stepwise<?, I, E> child) {
    _actions.add(new ReportChild<I, E>(child)); }

  protected void abort(final E failure) {
    _actions.add(new ReportFail<I, E>(failure)); }

  protected void done() {
    _actions.add(new ReportDone<I, E>()); }

  protected void commit(final Stepwise<?, I, E> comp) {
    _actions.add(new ReportReplace<I, E>(comp)); }}
\end{verbatim}\end{small}
%% \end{lstlisting}
The method |emit| enqueues a |ReportInfo| report
that provides user-specified progress information.
The method |resumeAfter| enqueues a |ReportChild|
report. It demands that the |nextVisit| method
is only called again after strict evaluation of the
provided stepwise computation (presumably a child
of the current computation) is run to completion first.
The methods |abort| and |done| enqueue failure and
completion reports respectively. Finally, the method
|commit| enqueues a |ReportReplace| report. It specifies
that the current stepwise computation should be replaced
by the provided computation. We use this report later to
be able to replace a choice between stepwise computations
by one of the choices.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public interface Report<I, E> {}

public class ReportReplace<I,E> implements Report<I,E> {
  private Stepwise<?,I,E> _comp;

  public ReportReplace(final Stepwise<?,I,E> comp) {
    _comp = comp; }

  public Stepwise<?,I,E> get() {
    return _comp; }}

public class ReportInfo<I,E> implements Report<I,E> {
  private I _info;

  public ReportInfo(final I info) {
    _info = info; }

  public I get() {
    return _info; }}

public class ReportFail<I,E> implements Report<I,E> {
  private E _failure;

  public ReportFail(final E failure) {
    _failure = failure; }

  public E get() {
    return _failure; }}

public class ReportDone<I,E> implements Report<I,E> {
  public ReportDone() {}}

public class ReportChild<I,E> implements Report<I,E> {
  private Stepwise<?,I,E> _child;

  public ReportChild(final Stepwise<?,I,E> child) {
    _child = child; }

  public Stepwise<?,I,E> get() {
    return _child; }}
\end{verbatim}\end{small}
%% \end{lstlisting}
In contrast to the Haskell implementation does a
|ReportDone| not provide the resulting values for
synthesized attributes. This turns out to be more
convenient for the Java implementation, because we
then don't have to concern ourselves with the
type |X| of the stepwise computations. We hide this
type via existentials in the |ReportReplace| and
|ReportChild| reports. After strict
evaluation is done for a node, we may call |lazyEval|
on the stepwise computation and get immediate access
to the already evaluated attributes.

The visit methods of nodes invoke rules in a fixed order.
Alternatively, when accessing attributes, rules may be
invoked on-demand. To prevent double calculations and
side effect, |Rules| must implement the |execute| method,
which is only called from the |run| method when it has
not been run yet.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public abstract class Rule implements Runnable {
  private boolean _hasRun;

  public Rule() {
    _hasRun = false; }

  public void run() {
    if (!_hasRun) {
      _hasRun = true;
      execute(); }}

  protected abstract void execute(); }
\end{verbatim}\end{small}
%% \end{lstlisting}

%if false
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public final class PredConst extends Pred {
  private boolean _value;
  private Rule _rule1;

  public PredConst(final boolean value) {
    _value = value;

    // construct rules
    _rule1 = new Rule() {
      public void execute() {
        syns().value().set(_value) }}

    // setup dependencies of synthesized and local attributes
    syns().value().dependsOn(_rule1); }

  protected void visit(final int state) {
    switch (state) {
    case 0:
      _rule1.run(); // compute syn attr
      emit(new InfoWork());
      done();
      break;
    default:
      done();
      break; }}}
\end{verbatim}\end{small}
%% \end{lstlisting}
%endif

For each production, we generate a subclass of the abstract class of its nonterminal.
An object of this subclass contains values of the terminal and nonterminal
symbols as private fields. These values must be provided as parameters to the constructor.
The constructor creates the rules for the production, and connects rules to the
synthesized and local attributes. The attributes are constructed by the constructor of
the |Pred| base class.
A rule may refer to other attributes. If such an attribute has not been evaluated, the
dereference causes its on-demand evaluation. When a rule is called via strict evaluation,
it is actually guaranteed that the attributes where the rule depends on are already evaluated.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public final class PredVar extends Pred {
  private final String _name;  // symbols
  private final Rule _rule1;   // rules of the node

  public PredVar(final String name) {
    _name = name;

    // construct rules
    _rule1 = new Rule() {
      public void execute() {
        boolean b = inhs().env().get().get(_name);
        syns().value().set(b); }}

    // setup dependencies of synthesized and local attributes
    syns().value().dependsOn(_rule1); }

  protected void visit(final int state) {
    switch (state) {
    case 0:
      _rule1.run(); // compute syn attr
      emit(new InfoWork());
      break;
    default:
      done();
      break; }}}
\end{verbatim}\end{small}
%% \end{lstlisting}
The |visit| method executes rules in a fixed order, such that values are already
computed before they are needed (strict evaluation).

Nodes are either constructed by a rule, or available as private field.
A node, however, is not the same concept as a child.
A child is represented by two attributes: one
attribute containing a reference to the associated node, and an attribute
containing the stepwise computation of a child. Virtual children are only
represented by the attribute containing the stepwise computation.
The former attribute is used to associate rules to the child's inherited
attributes (or assign concrete values from these rules). The latter attribute
is used to obtain the stepwise computation in order to access the child's
synthesized attributes. Children are defined by rules (i.e. via a child-rule
in the formalization).
With this scheme, we support both nodes that are created up-front and
nodes that can be constructed on-the-fly.
For the production |PredLet| of the example, we have two nodes |_expr|
and |_body|. Rules |_rule4| and |_rule5| turn these nodes into children
|e| and |b| (represented by attributes |_eIn|, |_eOut|, |_bIn|, and |_bOut|
respectively).
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public final class PredLet extends Pred {
  // symbols
  private final String _name;
  private final Pred _expr;
  private final Pred _body;

  // local attributes
  private final Attr<Pred> _eIn;
  private final Attr<Pred> _bIn;
  private final Attr<Stepwise<PredSyn, Info, BacktrackException>> _eOut;
  private final Attr<Stepwise<PredSyn, Info, BacktrackException>> _bOut;

  private final Rule _rule1, _rule2, _rule3, _rule4, _rule5, _rule6, _rule7;

  public PredLet(final String name, final Pred expr, final Pred body) {
    _name = name;
    _expr = expr;
    _body = body;

    _eIn = new Attr<Pred>();
    _bIn = new Attr<Pred>();
    _eOut = new Attr<Stepwise<PredSyn, Info, BacktrackException>>();
    _bOut = new Attr<Stepwise<PredSyn, Info, BacktrackException>>();

    // construct rules
    _rule1 = new Rule() {
      public void execute() {
        HashMap<String, Boolean> env = inhs().env().get();
        _eIn.get().inhs().env().set(env); }}

    _rule2 = new Rule() {
      public void execute() {
        boolean val = _eOut.get().lazyEval().value().get();
        HashMap<String, Boolean> env = (HashMap<String, Boolean>) inhs()
            .env().get().clone();
        env.put(_name, val);
        _bIn.get().inhs().env().set(env); }}

    _rule3 = new Rule() {
      public void execute() {
        boolean val = _eOut.get().lazyEval().value().get();
        syns().value().set(val); }}

    _rule4 = new Rule() {
      public void execute() {
        _eIn.set(_expr);
        _eIn.get().inhs().env().dependsOn(_rule1); }}

    _rule5 = new Rule() {
      public void execute() {
        _bIn.set(_body);
        _bIn.get().inhs().env().dependsOn(_rule2); }}

    _rule6 = new Rule() {
      public void execute() {
        _eOut.set(_eIn.get().begin()); }}

    _rule7 = new Rule() {
      public void execute() {
        _bOut.set(_bIn.get().begin()); }}

    // setup dependencies of synthesized and local attributes
    syns().value().dependsOn(_rule3);
    _eIn.dependsOn(_rule4);
    _bIn.dependsOn(_rule5);
    _eOut.dependsOn(_rule6);
    _bOut.dependsOn(_rule7); }

  protected void visit(final int state) {
    switch (state) {
    case 0:
      _rule4.run(); // create expr child
      _rule1.run(); // assign its inh attr
      _rule6.run(); // prepare it
      resumeAfter(_eOut.get());
      break;
    case 1:
      _rule5.run(); // create body child
      _rule2.run(); // assign its inh attr
      _rule7.run(); // prepare it
      resumeAfter(_bOut.get());
      break;
    case 2:
      _rule3.run(); // assign syn attr
      done();
      break;
    default:
      done();
      break; }}}
\end{verbatim}\end{small}
%% \end{lstlisting}
In the above code, |_rule4| actually constructs the child |e| from
the node stored in the private field, and associates rules to the
child's inherited attributes. The |visit| method specifies the code
to run until the evaluation of the first child, the code to run
until the evaluation of the second child, and finally the code to
run in the end. By using |resumeAfter| on a child, it ensures that
the visit method is only invoked again after strict evaluation of
that child is finished.

In the |PredOr| production, the choice between the two children is
captured by the virtual child |res|. It is represented by an
attribute that contains a stepwise computation |ChooseOr| that
corresponds to this choice.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public final class PredOr extends Pred {
  // symbols
  private final Pred _left;
  private final Pred _right;

  // local attributes
  private final Attr<Pred> _leftIn;
  private final Attr<Pred> _rightIn;
  private final Attr<Stepwise<PredSyn, Info, BacktrackException>> _resOut;

  private final Rule _rule1, _rule2, _rule3, _rule4, _rule5, _rule6

  public PredOr(final Pred left, final Pred right) {
    _left = left;
    _right = right;

    _leftIn = new Attr<Pred>();
    _rightIn = new Attr<Pred>();
    _resOut = new Attr<Stepwise<Pred, Info, BacktrackException>>();

    // construct rules
    _rule1 = new Rule() {
      public void execute() {
        _leftIn.get().inhs().env().set(inhs().env().get()); }}

    _rule2 = new Rule() {
      public void execute() {
        _rightIn.get().inhs().env().set(inhs().env().get()); }}

    _rule3 = new Rule() {
      public void execute() {
        boolean b = _resOut.get().lazyEval().value().get();
        syns().value().set(b); }}

    _rule4 = new Rule() {
      public void execute() {
        _leftIn.set(_left);
        _leftIn.get().inhs().env().dependsOn(_rule1); }}

    _rule5 = new Rule() {
      public void execute() {
        _rightIn.set(_right);
        _rightIn.get().inhs().env().dependsOn(_rule2); }}

    _rule6 = new Rule() {
      public void execute() {
        ChooseOr choice = new ChooseOr(_leftIn.get().begin(), _rightIn
            .get().begin());
        _resOut.set(choice); }}

    // setup dependencies of synthesized and local attributes
    syns().value().dependsOn(_rule3);
    _leftIn.dependsOn(_rule4);
    _rightIn.dependsOn(_rule5);
    _resOut.dependsOn(_rule6); }}

  protected void visit(final int state) {
    switch (state) {
    case 0:
      _rule4.run(); // create left child
      _rule5.run(); // create right child
      _rule1.run(); // assign left's inh attr
      _rule2.run(); // assign right's inh attr
      _rule6.run(); // prepare it
      resumeAfter(_resOut.get());
      break;
    case 1:
      _rule3.run(); // assign syn attr
      done();
    default:
      done();
      break; }}}
\end{verbatim}\end{small}
%% \end{lstlisting}
Rule |_rule6| initializes the child by constructing a |MergeOr| computation that
takes the stepwise computations of the two children as parameter.
Also note that |_rule6| is assigned as dependency to |resOut| attribute. If |_rule6|
is not invoked via strict evaluation, then on-demand evaluation invokes it when
an attribute of child |res| is needed.

The choice between children proceeds by taking steps from both children and
inspecting the progress reports. If evaluation for one of the children is
finished, a choice can be made. Otherwise the choice itself emits a progress
report. A choice is made via the |commit| method, which yields a
|ReportReplace| report.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public class ChooseOr extends Merge<Pred, Info, BacktrackException> {
  protected final Stepwise<X,I,E> _left;
  protected final Stepwise<X,I,E> _right;

  public ChooseOr(final Stepwise<Pred, Info, BacktrackException> left,
      final Stepwise<Pred, Info, BacktrackException> right) {
    _left = left;
    _right = right; }

  protected void visit() {
    Report<Info, BacktrackException> r1 = _left.nextStep();
    Report<Info, BacktrackException> r2 = _right.nextStep();

    if (r1 instanceof ReportDone)
      commit(_left.lazyEval().syns().value().get() ? _left : _right);
    else if (r2 instanceof ReportDone)
      commit(_right.lazyEval().syns().value().get() ? _right : _left);
    else if (r1 instanceof ReportFail)
      commit(_right);
    else if (r2 instanceof ReportFail)
      commit(_left);
    else if (r1 instanceof ReportInfo && r2 instanceof ReportInfo)
      emit(new InfoWork()); }}
\end{verbatim}\end{small}
%% \end{lstlisting}
The class |Merge| is a stepwise computation. It implements lazy evaluation
by taking steps until the choice has been resolved. Then it proceeds
with lazy evaluation on the selected child.
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public abstract class Merge<X, I, E> extends CoroutineBase<I, E> implements
    Stepwise<X, I, E> {
  public Merge() {}

  public X lazyEval() {
    while (true) {
      Report<I, E> rep = nextStep();

      if (rep instanceof ReportReplace) {
        ReportReplace<I, E> repl = (ReportReplace<I, E>) rep;
        Stepwise<X, I, E> comp = (Stepwise<X, I, E>) repl.get();
        return comp.lazyEval(); }
      else if (rep instanceof ReportFail) {
        throw new RuntimeException(
            "all alternatives fail."); }}}}
\end{verbatim}\end{small}
%% \end{lstlisting}
Note the cooperation between stepwise and on-demand evaluation. Lazy evaluation
never has to redo the work already done due to stepwise evaluations.

Finally, we show the stepwise computation that can be obtained from a node.
The |Parents| class drives the stepwise evaluation of a subtree. It represents
the intermediate stages of strict evaluation on this subtree.
The subtree is represented by a stack.
The active node is at the top of the parents stack. The bottom of the stack is the root of
the subtree. Lazy evaluation returns the synthesized attributes of the root.
Depending on how much strict evaluation took place, attributes may already have
been computed.
The |nextStep| method delegates invocations to the deepest nodes of subtrees (e.g. the
top of the stack) until a
progress report can be yielded. Evaluation of such a node may cause new children
to be pushed on the stack (as reaction on a |ReportChild| report), nodes to
be replaced (as reaction of a |ReportReplace| report), and children to be popped off
(when evaluation for a child is complete).
%% \begin{lstlisting}[name=stepjava]
\begin{small}\begin{verbatim}
public final class Parents<X,I,E> implements Stepwise<X,I,E> {
  private X _syns;
  private LinkedList<Coroutine<I,E>> _stack;

  public Parents(final Node<X,I,E> node, final X syns) {
    _stack = new LinkedList<Coroutine<I,E>>();
    _stack.add(node);
    _syns = syns; }

  public X lazyEval() {
    _stack.clear();
    return _syns; }

  public Report<I,E> nextStep() {
    while(true) {
      Coroutine<I,E> head = _stack.poll();
      if (head == null)
        return new ReportDone<I,E>(); // stack empty, we are done

      if (head instanceof Parents) {  // merge stacks
        Parents<?,I,E> other = (Parents<?,I,E>) head;
        _stack.addAll(0, other._stack);
        continue; }

      Report<I,E> rep = head.nextStep();
      if (rep instanceof ReportReplace) {
        ReportReplace<I, E> repl = (ReportReplace<I, E>) rep;
        Coroutine<I, E> comp = (Coroutine<I, E>) repl.get();
        _stack.addFirst(comp);
        continue; }
      else if (rep instanceof ReportFail)
        return rep;
      else if (rep instanceof ReportChild) {
        ReportChild<I, E> child = (ReportChild<I, E>) rep;
        Coroutine<I, E> comp = (Coroutine<I, E>) child.get();
        _stack.addFirst(head);
        _stack.addFirst(comp);
        continue; }
      else if (rep instanceof ReportDone)
        continue;
      else if (rep instanceof ReportInfo) {
        _stack.addFirst(head);
        return rep; }}}}
\end{verbatim}\end{small}
%% \end{lstlisting}
The implementation can be improved a bit further, because we do not actively
remove references to children that are not needed anymore, thus retain memory
longer than strictly necessary.
Furthermore,
|lazyEval| can be improved by returning either the values of the synthesized
attributes, or a replacement computation. When a rule needs a synthesized
attribute, it should (via some helper object) iterate |lazyEval| until it
eliminated all intermediate |Merge| nodes.
The memory usage of the bookkeeping is linear in the number of nodes of the
tree plus the maximum number of progress reports that can be yielded by a node.
The computations needed for the bookkeeping run in time linear in the number of nodes
plus for each merge node in time linear to the maximum number of progress reports.

We showed a translation to Java. In a similar way a translation to C\# can be
made. An advantage of C\# over Java is that local attributes without a
type signature can be supported using untyped fields.
In principle, the approach of \thiswork{} is not restricted to a particular
programming language.
However, care has to be taken in combination with side effect. When a node is
shared (i.e. as in remote reference AGs~\citep{DBLP:journals/scp/MagnussonH07}),
a progress report is only received by one the parents.
This behavior may be desirable; many search algorithms work on graphs instead of
trees.
%endif

%if thesis
\end{subappendices}
%endif

%endif