\chapter{Grote Trap extended} \label{sec:grotetrapx} As we described in section \ref{sec:introduction}, the first step towards automatic selection conversion is to adapt a datatype to also hold the necessary position information. Early attempts extended Grote Trap's idea and focused on trees of concrete syntax, using a universal tree like this: \begin{code} type ParseTree a = Rose (Bounds, a) data Rose a = Rose a [Rose a] \end{code} Here a parse tree is a standard rose tree with at each node a |Bounds| value and an |a| value, the type of our AST. To understand how |ParseTree| works, let us take a look at an example. We will use the following simple and well-known datatype for arithmetic expressions: \begin{code} data BareExpr = Num Int | Add BareExpr BareExpr | Sub BareExpr BareExpr | Mul BareExpr BareExpr | Div BareExpr BareExpr deriving (Eq, Show) \end{code} The parse tree for the sentence |"1 + 2 * 3"| looks like this: \begin{code} example :: ParseTree BareExpr example = let one = Num 1 two = Num 2 three = Num 3 mul = Mul two three add = Add one mul in Rose (Bounds (0, 0) (9, 9), add) [ Rose (Bounds (0, 0) (1, 2), one) [] , Rose (Bounds (3, 4) (9, 9), mul) [ Rose (Bounds (3, 4) (5, 6), two) [] , Rose (Bounds (7, 8) (9, 9), three) [] ] ] \end{code} This approach has some advantages: \begin{itemize} \item The datatype is simple and easy to understand. \item Although the above expression is pretty big for such a simple example, construction of such trees during parsing can be made easy by hiding the details in a monad transformer, exposing a function to introduce branches. \item Mapping from tree selection to text selection is trivial: simply select the |Bounds| value of the rose node. \item Mapping from text selection to tree selection can also be done: walk the tree, finding the node whose bounds correspond to the query range. The searching can be implemented efficiently, pruning complete subtrees if their bounds do not surround the search range. \item |Rose| is a regular datatype and a corresponding zipper type |RoseZipper| to express structural selections is easily created. \end{itemize} However, this approach also has some serious disadvantages: \begin{itemize} \item Subexpressions are repeated in their entirety in subtrees. While this is not a memory problem---there is plenty of sharing going on---storing trees in this way causes a lot of redundancy and potential for invalid parse trees. The type offers no guarantees that subtrees are actual subexpressions. It also complicates updating such trees, as they have to be modified in various places in order to be kept consistent. \item The consumers of expressions that want to use the position information have to work on parse trees instead of values of type |BareExpr|. This makes these algorithms significantly less elegant. Similarly, functions working on zipper values have to work on |RoseZipper|s instead of |BareExprZipper|s. \end{itemize} To illustrate these problems, take a look at the following evaluation function for expressions. We want the function to work on parsed expressions and to return an error on division by zero, indicating where in the input sentence the error occurred. Therefore the function takes a |ParseTree BareExpr| as input rather than a vanilla |BareExpr| so that position information is available, and it returns either a |Bounds| or an |Int|. \begin{code} eval :: ParseTree BareExpr -> Either Bounds Int eval (Rose (bounds, expr) cs) = case (expr, cs) of (Num n, []) -> pure n (Add _ _, [x, y]) -> (+) <$> eval x <*> eval y (Sub _ _, [x, y]) -> (-) <$> eval x <*> eval y (Mul _ _, [x, y]) -> (*) <$> eval x <*> eval y (Div _ _, [x, y]) -> do x' <- eval x y' <- eval y case y' of 0 -> Left bounds _ -> pure (x' `div` y') \end{code} The problems are immediately apparent. Firstly, evaluation has to be written in applicative or even monadic style to propagate potential errors.\footnote{Note that a non-standard |instance Monad (Either e)| is used: one with no constraint on the error parameter |e|.} Secondly, the fields of the |BareExpr|s are simply discarded. Instead the list of child parse trees is pattern-matched against a fixed number of expected children, depending on the specific constructor. This indicates that the fields are simply not necessary; then why were they set in the first place? Furthermore, cases where the number of child parse trees fails to match expectations result in a runtime pattern match error. Functional programmers like it when the type system ensures safety, but the datatype we have here contains redundant information and allows bad shapes. We can do better than this.