\chapter{Future work} \label{sec:future} \begin{figure} \centering \includegraphics[width=0.60\textwidth]{Pipeline} \caption{The refactoring cycle.} \label{fig:pipeline} \end{figure} \section{Dealing with AST mutations} The proposal for this thesis contained figure \ref{fig:pipeline}, depicting the refactoring cycle: let us say the programmer is working in his or her favorite IDE, selects a piece of source code and performs a refactoring action. To do this, the IDE has to map the text selection to a structural selection on the AST in memory: something we have discussed in detail in the previous chapters. Then the selection must be processed so that the refactoring action is carried out, mutating the AST in the process. Perhaps parts of the code must be deleted or moved to a new position, or perhaps new code has to be created. Dealing with mutations to ASTs is a very interesting problem we have not addressed yet, and it would make a nice subject for further research. For example, how do we make sure the text positions stored in the AST stay correct after a mutation? A possible solution is to not store position information but the exact source code responsible for each subtree. From this, position information can be inferred. But this introduces a whole new set of questions: Can we generically add source code to trees? Of course we can put it in an annotation, but how do we do that in such a way that there is no redundant information, i.e.~such that a tree's annotation does not also store the source code of its child trees? Do we need to create a derived datatype for this? And how can we efficiently compute position information from the source code? How can we elegantly express the generation of new pieces of code? What would the rewriting API look like? Perhaps work on generic rewriting of unannotated trees \cite{vannoort08rewriting} is of help here. \section{Contexts in algebras} Both approaches discussed in this thesis relied on the programmer to express their data-consuming functions as algebras to catamorphisms in order for their results to be automatically annotated. Most consumers can be expressed in terms of an algebra, because the result type of the algebra is allowed to be a function |context -> result| where |context| can be any context information necessary to compute the result. In the extreme case it can be the zipper context, in which case the algebra has access to the complete input datatype. A well-known example is the evaluation of arithmetic expressions with variables. The algebra result type is often something like |Environment -> Integer|. When evaluating a variable, the variable is looked up in the environment to see what value it has. In the evaluation of a binding, a key-value pair is added to the environment. But such result types do not work well in combination with error algebras: the type |ErrorAlgebra f e (c -> a)| is equivalent to |f (c -> a) -> Either e (c -> a)|. This is not very convenient: it means the algebra has to decide whether to throw an error without being able to look at the context. Imagine this in the scenario above: an algebra encounters a |Var| node and would like to throw an error if the variable is unbound in the environment, but is unable to do so because no environment is available. A more convenient algebra type would be |f (c -> a) -> c -> Either e a|, where the context is moved out of the |Either| constructor and thus available when deciding to yield a |Left| or a |Right|. It would be interesting to see if |errorCata| could be changed to work with such algebras. \section{Indexed base functors} The first approach (base functors) has as drawback that it cannot be used with multiple datatypes, restricting its use to very simple ASTs. The second approach (MultiRec) has as drawback that the original constructors cannot be used in the derived datatype, making it very difficult to come up with a convenient way to write parsers (or producers in general); we chose for the unsafe, stateful |YieldT| monad transformer. Consumers (e.g.~the algebras for the catamorphisms) were a little better: the algebra type could be computed from the AST's pattern functor. But this, too, was not perfect as the parts for the various constructors were combined together in nested tuples. Although it is still \emph{safe} (the compiler will complain if, say, the part for one constructor is omitted), the resulting type errors are big and confusing. Can we combine the best parts of these two approaches and create something in which there is only a single set of constructors (e.g.~no counterparts composed of |L|s and |R|s), yet allows families of datatypes? The answer seems to be yes. The MultiRec paper \cite{rodriguez2008mutrec} in section 4.2 encodes the pattern functor of their example AST directly as a GADT, with the recursive positions made explicit using a type parameter |r|. The |Tuples| example we have been using would look like this: \begin{code} data Tuples r ix where ExAdd :: r Expr -> r Expr -> Tuples r Expr ExMul :: r Expr -> r Expr -> Tuples r Expr ExTup :: r Expr -> r Expr -> Tuples r Expr ExInt :: Integer -> Tuples r Expr ExTyp :: r Expr -> r Type -> Tuples r Expr TyInt :: Tuples r Type TyTup :: r Type -> r Type -> Tuples r Type \end{code} Using this representation, we would have the good properties of both approaches explored in this thesis. It would be interesting to see how all the related ideas (catamorphisms, parsers, selection translations, et cetera) translate to this approach, and to see what problems or new possibilities would arise. \section{Lists of children} The example ASTs in this thesis have not used any lists of children. Lists of child nodes are very common in ASTs; it would be interesting to see how these match with the current approaches. In the base functor approach, it should not be difficult to use them: the Traversable instances are easy to adapt to constructors with list fields, the algebras work well with them, as do the parsers. The current version of MultiRec, however, has no support for them. The question of how to generically encode \emph{things} of children in pattern functors is an open one, but it should be a lot easier to consider \emph{lists} of children a special case, introducing a new pattern functor building block just for this purpose. How would this affect all the components we have developed?