\chapter{Future work}

\label{sec:future}

\begin{figure}
\centering
\includegraphics[width=0.60\textwidth]{Pipeline}
\caption{The refactoring cycle.}
\label{fig:pipeline}
\end{figure}

\section{Dealing with AST mutations}

The proposal for this thesis contained figure \ref{fig:pipeline}, depicting
the refactoring cycle: let us say the programmer is working in his or her
favorite IDE, selects a piece of source code and performs a refactoring
action. To do this, the IDE has to map the text selection to a structural
selection on the AST in memory: something we have discussed in detail in the
previous chapters. Then the selection must be processed so that the
refactoring action is carried out, mutating the AST in the process. Perhaps
parts of the code must be deleted or moved to a new position, or perhaps new
code has to be created.

Dealing with mutations to ASTs is a very interesting problem we have not
addressed yet, and it would make a nice subject for further research. For
example, how do we make sure the text positions stored in the AST stay correct
after a mutation? A possible solution is to not store position information but
the exact source code responsible for each subtree. From this, position
information can be inferred.

But this introduces a whole new set of questions: Can we generically add
source code to trees? Of course we can put it in an annotation, but how do we
do that in such a way that there is no redundant information, i.e.~such that a
tree's annotation does not also store the source code of its child trees? Do
we need to create a derived datatype for this? And how can we efficiently
compute position information from the source code? How can we elegantly
express the generation of new pieces of code? What would the rewriting API
look like? Perhaps work on generic rewriting of unannotated trees
\cite{vannoort08rewriting} is of help here.

\section{Contexts in algebras}

Both approaches discussed in this thesis relied on the programmer to express
their data-consuming functions as algebras to catamorphisms in order for their
results to be automatically annotated. Most consumers can be expressed in
terms of an algebra, because the result type of the algebra is allowed to be a
function |context -> result| where |context| can be any context information
necessary to compute the result. In the extreme case it can be the zipper
context, in which case the algebra has access to the complete input datatype.

A well-known example is the evaluation of arithmetic expressions with
variables. The algebra result type is often something like |Environment ->
Integer|. When evaluating a variable, the variable is looked up in the
environment to see what value it has. In the evaluation of a binding, a
key-value pair is added to the environment.

But such result types do not work well in combination with error algebras: the
type |ErrorAlgebra f e (c -> a)| is equivalent to |f (c -> a) -> Either e (c
-> a)|. This is not very convenient: it means the algebra has to decide
whether to throw an error without being able to look at the context. Imagine
this in the scenario above: an algebra encounters a |Var| node and would like
to throw an error if the variable is unbound in the environment, but is unable
to do so because no environment is available. A more convenient algebra type
would be |f (c -> a) -> c -> Either e a|, where the context is moved out of
the |Either| constructor and thus available when deciding to yield a |Left| or
a |Right|. It would be interesting to see if |errorCata| could be changed to
work with such algebras.

\section{Indexed base functors}

The first approach (base functors) has as drawback that it cannot be used with
multiple datatypes, restricting its use to very simple ASTs. The second
approach (MultiRec) has as drawback that the original constructors cannot be
used in the derived datatype, making it very difficult to come up with a
convenient way to write parsers (or producers in general); we chose for the
unsafe, stateful |YieldT| monad transformer. Consumers (e.g.~the algebras for
the catamorphisms) were a little better: the algebra type could be computed
from the AST's pattern functor. But this, too, was not perfect as the parts
for the various constructors were combined together in nested tuples. Although
it is still \emph{safe} (the compiler will complain if, say, the part for one
constructor is omitted), the resulting type errors are big and confusing. Can
we combine the best parts of these two approaches and create something in
which there is only a single set of constructors (e.g.~no counterparts
composed of |L|s and |R|s), yet allows families of datatypes? The answer seems
to be yes.

The MultiRec paper \cite{rodriguez2008mutrec} in section 4.2 encodes the
pattern functor of their example AST directly as a GADT, with the recursive
positions made explicit using a type parameter |r|. The |Tuples| example we
have been using would look like this:

\begin{code}
data Tuples r ix where
  ExAdd  :: r Expr   -> r Expr  ->  Tuples r Expr
  ExMul  :: r Expr   -> r Expr  ->  Tuples r Expr
  ExTup  :: r Expr   -> r Expr  ->  Tuples r Expr
  ExInt  :: Integer  ->             Tuples r Expr
  ExTyp  :: r Expr   -> r Type  ->  Tuples r Expr
  TyInt  ::                         Tuples r Type
  TyTup  :: r Type   -> r Type  ->  Tuples r Type
\end{code}

Using this representation, we would have the good properties of both
approaches explored in this thesis. It would be interesting to see how all the
related ideas (catamorphisms, parsers, selection translations, et cetera)
translate to this approach, and to see what problems or new possibilities
would arise.

\section{Lists of children}

The example ASTs in this thesis have not used any lists of children. Lists of
child nodes are very common in ASTs; it would be interesting to see how these
match with the current approaches.

In the base functor approach, it should not be difficult to use them: the
Traversable instances are easy to adapt to constructors with list fields, the
algebras work well with them, as do the parsers. The current version of
MultiRec, however, has no support for them.

The question of how to generically encode \emph{things} of children in pattern
functors is an open one, but it should be a lot easier to consider
\emph{lists} of children a special case, introducing a new pattern functor
building block just for this purpose. How would this affect all the components
we have developed?