\chapter{Grote Trap extended}

\label{sec:grotetrapx}

As we described in section \ref{sec:introduction}, the first step towards
automatic selection conversion is to adapt a datatype to also hold the
necessary position information. Early attempts extended Grote Trap's idea and
focused on trees of concrete syntax, using a universal tree like this:

\begin{code}
type  ParseTree  a  = Rose (Bounds, a)
data  Rose       a  = Rose a [Rose a]
\end{code}

Here a parse tree is a standard rose tree with at each node a |Bounds| value
and an |a| value, the type of our AST. To understand how |ParseTree| works,
let us take a look at an example. We will use the following simple and
well-known datatype for arithmetic expressions:

\begin{code}
data BareExpr
  = Num  Int
  | Add  BareExpr BareExpr
  | Sub  BareExpr BareExpr
  | Mul  BareExpr BareExpr
  | Div  BareExpr BareExpr
  deriving (Eq, Show)
\end{code}

The parse tree for the sentence |"1 + 2 * 3"| looks like this:

\begin{code}
example :: ParseTree BareExpr
example =
  let  one    = Num 1
       two    = Num 2
       three  = Num 3
       mul    = Mul two three
       add    = Add one mul
   in  Rose          (Bounds (0, 0) (9, 9), add)
         [  Rose     (Bounds (0, 0) (1, 2), one)    []
         ,  Rose     (Bounds (3, 4) (9, 9), mul)
            [  Rose  (Bounds (3, 4) (5, 6), two)    []
            ,  Rose  (Bounds (7, 8) (9, 9), three)  []
            ]
         ]
\end{code}

This approach has some advantages:

\begin{itemize}
  \item The datatype is simple and easy to understand.
  \item Although the above expression is pretty big for such a simple example,
  construction of such trees during parsing can be made easy by hiding
  the details in a monad transformer, exposing a function to introduce
  branches.
  \item Mapping from tree selection to text selection is trivial: simply
  select the |Bounds| value of the rose node.
  \item Mapping from text selection to tree selection can also be done: walk
  the tree, finding the node whose bounds correspond to the query range. The
  searching can be implemented efficiently, pruning complete subtrees if their
  bounds do not surround the search range.
  \item |Rose| is a regular datatype and a corresponding zipper type
  |RoseZipper| to express structural selections is easily created.
\end{itemize}

However, this approach also has some serious disadvantages:

\begin{itemize}
  \item Subexpressions are repeated in their entirety in subtrees. While this
  is not a memory problem---there is plenty of sharing going on---storing
  trees in this way causes a lot of redundancy and potential for invalid parse
  trees. The type offers no guarantees that subtrees are actual
  subexpressions. It also complicates updating such trees, as they have to be
  modified in various places in order to be kept consistent.
  \item The consumers of expressions that want to use the position information
  have to work on parse trees instead of values of type |BareExpr|. This makes
  these algorithms significantly less elegant. Similarly, functions working on
  zipper values have to work on |RoseZipper|s instead of |BareExprZipper|s.
\end{itemize}

To illustrate these problems, take a look at the following evaluation function
for expressions. We want the function to work on parsed expressions and to
return an error on division by zero, indicating where in the input sentence
the error occurred. Therefore the function takes a |ParseTree BareExpr| as
input rather than a vanilla |BareExpr| so that position information is
available, and it returns either a |Bounds| or an |Int|.

\begin{code}
eval :: ParseTree BareExpr -> Either Bounds Int
eval (Rose (bounds, expr) cs) =
  case (expr, cs) of
    (Num  n,    [])      -> pure n
    (Add  _ _,  [x, y])  -> (+)  <$> eval x <*> eval y
    (Sub  _ _,  [x, y])  -> (-)  <$> eval x <*> eval y
    (Mul  _ _,  [x, y])  -> (*)  <$> eval x <*> eval y
    (Div  _ _,  [x, y])  -> do
      x'  <- eval x
      y'  <- eval y
      case y' of
        0  -> Left bounds
        _  -> pure (x' `div` y')
\end{code}

The problems are immediately apparent. Firstly, evaluation has to be written
in applicative or even monadic style to propagate potential
errors.\footnote{Note that a non-standard |instance Monad (Either e)| is used:
one with no constraint on the error parameter |e|.} Secondly, the fields of
the |BareExpr|s are simply discarded. Instead the list of child parse trees is
pattern-matched against a fixed number of expected children, depending on the
specific constructor. This indicates that the fields are simply not necessary;
then why were they set in the first place? Furthermore, cases where the number
of child parse trees fails to match expectations result in a runtime pattern
match error. Functional programmers like it when the type system ensures
safety, but the datatype we have here contains redundant information and
allows bad shapes.

We can do better than this.