%let report = True
%if report 
\documentclass{article}
\usepackage{a4wide}
\usepackage{TRtitlepage}
%else
\documentclass[runningheads,a4paper]{llncs}
%endif
%if report
\author{{S. Doaitse Swierstra} and {Atze Dijkstra}}
%then
%else
\authorinfo{S. Doaitse Swierstra \\ Atze Dijkstra}
           {Department of Computer Science\\ Utrecht University\\ Utrecht, The Netherlands}
           {\{doaitse,atze\}@@cs.uu.nl}
%endif


%%%%%%%%%%%%%%%%%%%%%%% file typeinst.tex %%%%%%%%%%%%%%%%%%%%%%%%%
%
% This is the LaTeX source for the instructions to authors using
% the LaTeX document class 'llncs.cls' for contributions to
% the Lecture Notes in Computer Sciences series.
% http://www.springer.com/lncs       Springer Heidelberg 2006/05/04
%
% It may be used as a template for your own input - copy it
% to a new file with a new name and use it as the basis
% for your article.
%
% NB: the document class 'llncs' has its own and detailed documentation, see
% ftp://ftp.springer.de/data/pubftp/pub/tex/latex/llncs/latex2e/llncsdoc.pdf
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%include lhs2TeX.fmt
%include polycode.fmt
\usepackage{amssymb}
\usepackage{listings}
\setcounter{tocdepth}{3}
\usepackage{graphicx}
\usepackage{comment}
\excludecomment{lhs}
\usepackage{url}
\usepackage{xspace}
\newcommand{\eps}{$\epsilon$\ }
\urldef{\mailsa}\path||{doaitse}@@swierstra.net||    
\newcommand{\keywords}[1]{\par\addvspace\baselineskip
\noindent\keywordname\enspace\ignorespaces#1}
\usepackage{listings} 

\lstset{% 
  basicstyle=\ttfamily, 
  breaklines=true,
  columns=fullflexible,
  escapeinside = ||,
  breakindent=0pt} 
\lstnewenvironment{examplee}{}{} 

\begin{document}
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}

%if report 
  \TRtitlepage
  {Parse Your Options}  % argument 1 <= the title
  {S.~Doaitse~Swierstra\\Atze Dijkstra\par} % argument 2 <= authors
  {UU-CS-2013-005}                                         % argument 3 <= report number
  {May 2013} 
%else
\mainmatter  % start of an individual contribution
%endif
% first the title is needed
\title{Parse Your Options}

% a short form should be given in case it is too long for the running head
%if report
%then
%else
\titlerunning{Parse Your Options}

\author{S. Doaitse Swierstra and Atze Dijkstra}
%\authorrunning{!!!!!!!!!}
\institute{Dept. of Computer Science, P.O. Box 80.089, 3508 TB Utrecht, the Netherlands\\
%\mailsc\\
\url{http://www.cs.uu.nl}}
%endif
\maketitle
\begin{abstract}
  We describe the development of a couple of combinators which can be
  used to run applicative style parsers in an interleaved way. In the
  presentation we advocate a scheme for choosing identifier names
  which clearly shows the types of the values involved, and how to
  compose them into the desired result.  We finish with describing how
  the combinators can be used to parse  command line arguments  and files containing options.

%if not report
%then
\keywords{Parser combinators, Haskell, Parallel parsing, Option processing, Permutation parsing}
%endif
\end{abstract}
%format ^^ = " "
%format  <*        = "\mathbin{\text{\small\ttfamily{<*}}}"
%format  *>        = "\mathbin{\text{\small\ttfamily{*>}}}"
%format  <*>       = "\mathbin{\text{\small\ttfamily{<*>}}}"
%format  <**>      = "\mathbin{\text{\small\ttfamily{<**>}}}"
%format  <|>       = "\mathbin{\text{\small\ttfamily{<|>}}}"
%format  <||>      = "\mathbin{\text{\small\ttfamily{<||>}}}"
%format  <<||>      = "\mathbin{\text{\small\ttfamily{<<||>}}}"
%format  <+>       = "\mathbin{\text{\small\ttfamily{<+>}}}"
%format  <$>       = "\mathbin{\text{\small\ttfamily{<\$>}}}"
%format  <$        = "\mathbin{\text{\small\ttfamily{<\$}}}"
%format  iI        = "\llfloor" 
%format  Ii        = "\rrfloor"
%format  <++>       = "\mathbin{\text{\small\ttfamily{<++>}}}"
%format  +>>        = "\mathbin{\text{\small\ttfamily{+>>}}}"
\newcommand{\uulib}{\texttt{uulib}\xspace}
\newcommand{\uuparsinglib}{\texttt{uu-parsinglib}\xspace}
%format b2g_a
%format c2g_b 
%format c2g_b2a
%format g_c
%format g_b
%format g_b2a
%format f_c2b
%format f_c2a
%format f_b2a
%format f_c2b2a
%format f_c
%format f_b
%format l_a
%format l_b 
%format l_b2a
%format m_a
%format m_b
%format m_b2a

\section{Introduction}

In the original version of the \uulib
\footnote{\url{http://hackage.haskell.org/package/uulib}} parsing
combinator library we introduced two modules, providing combinators
for parsing more complicated input structures than those described by
context free languages: one for recognising permuted structures and
one for recognising merged lists.  In this paper we present an
\emph{interleave} combinator which generalises these two patterns.  As
we will see this library can almost completely be constructed using
the basic |Applicative| and |Alternative| parser interfaces as
e.g. provided by the \uuparsinglib package, without having to deal
with the intricate internals of the parsing process itself. We only
require that it is possible to split a parser into two components: one
which recognises the empty sequence in case the original parser can do so
and one which takes care of the non-empty cases.  Of course, a
direct consequence of choosing a feature rich library as
\uuparsinglib is that the parsers constructed out of it 
inherit all its nice properties: returning results in an online way,
providing informative error messages, and adapting the input where
parsing cannot proceed.

Before we introduce our new library we introduce some common
definitions and give an example of parsing permuted and merged
structure in Section \ref{sec:defs}, followed by a motivating example
for our new combinators in Section \ref{sec:example}. Section
\ref{sec:core} forms the core of the paper describing the
internals of the new library, whereas in Section \ref{sec:options} we
develop a small library using the new combinators for 
processing command line options. We will see that there is not much
more work left for the programmer than just enumerating the possible
options and the way they are to be denoted. We conclude
in Section \ref{sec:conclusions}.


\section{Common definitions and background}\label{sec:defs}
We start out by repeating the definition some familiar classes, and
give applications of the previously introduced combinators. The class
|Applicative| introduces a combinator |<*>| that combines two
side-effecting functions into a single function. The important
observation is that the results of the two functions are combined by
applying the result of the first to that of the second. In this paper
such functions will be referred to as parsers, as this is the way we
will instantiate the interface.  The |<*>| operator is a function
which runs its two parser arguments sequentially on the input:
i.e. the second parser takes off in with the input state in which the
first one has finished. The class |Alternative| introduces the
companion operator |<||>| which runs either one of its operands, but
not both. Its unit element is |empty|; this is not, as its name
suggests, the parser which recognises the empty string but instead the
parser which always fails. For the sake of completeness we also
introduce the |Functor| class.

\begin{spec}
class Functor f where
  fmap   :: (b -> a) -> f b    -> f a

class Applicative f where
  (<*>)  :: f (b -> a) -> f b  ->  f a
  pure   :: a                  ->    f a

class Alternative f where
  (<|>) :: f a -> f a -> f a
  empty :: f a

\end{spec}
Next we introduce some helper functions that may be used to modify
the result of a parser, or to throw away a result in which we are
not interested:
\begin{spec}
(<$>) :: Functor f => (b -> a) -> f b -> f a
(<$>)      = fmap 
p <*   q   = const       <$> p <*> q
f <$   p   = const f     <$> p
p *>   q   = flip const  <$> p <*> q
p `opt` a  = p <|> pure a
\end{spec}

In addition to the operator for sequential composition |<*>| we
defined \cite{1030338} a permuting combinator, with a similar type (the precise definition of the |g| does not matter here):
\begin{code}
 <||> :: g (b -> a) -> g b -> g a
\end{code}
This operator also recognises both its operands, but irrespective of
the order in which they occur in the input stream, i.e. we may either
run its left operand and have its right operand take off where the
first one finished or the other way around. Actually it is even more
expressive in the sense that if either of its |g|-operands is again
built using |<||||>| the elements of the other operand may intervene.
Using this combinator we can construct parsers which recognise input
in which the components are possibly permuted. Examples of such uses
abound: the order of the fields in a \BibTeX\ entry is not fixed, nor
is the order of the specification of the field values in a Haskell
record fixed. Consider the data type:
\begin{spec}
data Cart  =  Cart   {x    :: Float;  y    :: Float} 
           |  Polar  {rho  :: Float;  phi  :: Float}
\end{spec}
         
Now one might want to be able to read both @Cart{x=1,y=2}@ as well as
@Cart{y=2,x=1}@ from the input; if a Haskell compiler accepts both
representations, why shouldn't we be able to read both these
representations from the input? Using the |<||||>| parser this is
easily achieved (|mkG| and |sepBy| are helper functions to which we
will come back later):

\begin{spec}
pTwoFloats op constr s1 s2 
      =  pToken constr *> pCurly  (  (op    <$>   pField s1 pFloat
                                            <||>  pField s2 pFloat) 
                                     `sepBy` pSym ','
                                  )
pField s  p  = mkG (pToken s *> pSym '=' *> p)
pCurly    p  = pSym '{' *> p <* pSym '}'

pCart =  ^^   pTwoFloats Cart   "Cart"   "x"    "y"
         <|>  pTwoFloats Polar  "Polar"  "rho"  "phi"  

\end{spec}


We \cite{Guerra:2005fk} furthermore defined a combinator |<+>| which
makes it possible to construct parsers which recognise merged
lists. In parser |int_low| the |listOf| function converts a parser to
a parser which may run interleaved with other parsers and recognises a
list of values accepted by its argument parser, the function |<+>| runs
both arguments in an interleaved mode, and |pMerged| converts its
argument back to a normal parser again:
\begin{spec}
int_low :: Parser ([Int],[Char]) 
int_low = pMerged  (  listOf pInt    <+> listOf pLower   ) 
\end{spec}
The function |int_low| accepts the input string |"a1bc2"| and returns the nested pair |([ 1,
2 ], "abc")|; the input is split into two sub-streams which are each accepted
by one of the two operands of |<+>|.

On closer inspection we see that the list merging parsers are a generalisation of the permuting parsers: just make sure that each of the merged lists has
precisely length 1 and list merging degenerates to permuting.  This made us also realise that the merging of
lists was just a special case of {\em merging any collection of
  structured sequences}, which in its turn raised the question whether
we can define combinators which make it possible to describe this
process, thus generalising the two libraries just mentioned into a
single one.


\section{Parsing Log Files}\label{sec:example}
As a motivation for our new combinators we start with a simple example
of their use. It deals with unraveling a file which was created by
several concurrent processes writing entries into it: each process
indicates its start by writing out the character |'s'| followed by its
unique identity (numbers in the example), next it will emit a
couple of work entries each consisting of the character |'w'| followed
by its process identity and a space. The rest of the line contains log
information. A process indicates its termination by  closing its sequence of entries with a line consisting
of the character |'c'| gaian followed by its process identity. So all entries generated
by a single proces will be labeled with the same process identity, which
we assume to be unique throughout the log file. Given e.g. a log file containing the following lines:
\begin{code}
s2        -- start of process 2
s1        -- start of process 1
w1 a1     -- first work entry of process 1
w2 b      -- first work entry of process 2
w1 a2     -- second work entry of process 1
c1        -- process 1 closes
s3        -- start of process 3
w3 c      -- first work entry of process 3
c2        -- process 2 closes
c3        -- process 3 closes
\end{code}
we want to produce the following table, where each process is associated with the contents of its work entries:
\begin{code}
[("2",["b"]),("1",["a1","a2"]),("3",["c"])]
\end{code}
In the code we use names starting with a |g| to bind to parsers
which can run in an interleaved way (which we will refer to as
\emph{grammars} from now on), whereas names starting with a |p| refer
to conventional parsers which recognise a consecutive segment of
input. The function |mkG| converts a conventional parser to a
grammar; the result will still recognise a consecutive part of the
input, but once it has done so it may pass control on to a competing
parser.

We start out by defining the grammar |gProcess| recognising the sequence of events for a
single process, using some of the basic parsers provided by common
combinator libraries providing an applicative interface, and
subsequently use the function |gmList| to concurrently run as many of
them as needed. We have used a monadic bind to use the process identity
retrieved from the starting line in constructing the parsers for the
subsequent lines of this process.
\begin{code}
gProcess  =  do  i     <-               mkG    pStart  
                 w     <-   pMany  .    mkG $  pWork  i
                 _     <-               mkG $  pClose i
                 return (i, w)

gLog      =  gmList gProcess 
\end{code}
 Next we define the parsers recognising the various kinds of log entries.
The function |pMunch| accepts the longest prefix of the input which passes the predicate parameter.
\begin{code}
pStart    =   pToken "s" *> pMunch (/= '\n') <* pToken "\n" 
pWork  i  =        pToken ('w': i ++ " ") 
              *>   pMunch (/= '\n') -- read the rest of the line
              <*   pToken "\n"
pClose i  =   pToken ('c': i++"\n") 


\end{code}
Note that the order in which the work comes out is determined by the order of their first entry in the log file. 

\section{Merging Parsers}\label{sec:core}
\begin{lhs}
\begin{code}
{-# LANGUAGE ExistentialQuantification,
             ScopedTypeVariables #-}
module Text.ParserCombinators.UU.MergeAndPermute where
import Text.ParserCombinators.UU.Core
import Debug.Trace
\end{code}
\end{lhs}

\subsection{Representing parsers for merged input structures}
As we have seen in the example, what we are looking for is the
possibility to interleave parsers; one can
think of this as associating a separate colour with each parser and
splitting up the input in a series of coloured segments, such
that the concatenation of segments of the same colour is accepted by
its correspondingly coloured parser.

The question to answer is how to
split the input into uniformly coloured segments, i.e. how to find the
points in the input where colour switching may take place and what colour to give to the segments.  In order
to be able to do so efficiently we have decided
to construct our interleaving parsers out of \emph{basic} parsers, which
are conventional parsers which recognise a \emph{consecutive part} of
the input. We rephrase our use of the word \emph{grammar} to stand for a \emph{description of} a parser
which can pause at a colour switch and continue when the input
switches back to its colour again. Such grammars can thus be run in a competing
fashion. From this point on we will use the word |parser| to refer to a conventional parser
which recognises a segment of the input.

Once a parser gets hold of the input, and has successfully started
parsing, we will let this parser run until completion. Once it has
completed, all pending grammars (including the grammar for which the
parser which just ran forms a constituent) can try to continue
to parse.  A consequence of this is that at any point in the
input where we may switch between grammars each of the competing
grammars should present its \emph{first-parsers}, i.e. the candidates for accepting the next uniformly coloured
segment. These presented first-parsers play a similar role as the
{\em first} sets resulting from an $LL(1)$ grammar analysis; the only
difference is that we dynamically compute the collection of first-parsers instead
of statically a the set of first symbols.

A very important issue we have to take care of is the avoidance of unwanted ambiguity as the following example shows. Suppose
we have the following permuting parser:
\begin{spec}
pMaybe s = pToken s `opt` ""
pAmb = (,) <$> pMaybe "A" <||> pMaybe "B"
\end{spec}
and our input consists of the string |"A"|, then there are two
different ways in which the empty string |""| can be seen as part of
the input: |"" ++ "A"| and |"A" ++""|. Thus when running the parser
|pAmb| on the input string |"A"| there are two possible parses, both
returning the same result.  Unless our underlying parsing library
somehow knows how to deal with this we will eitherget an error
message, a rather arbitrary choice of one of the possible parses or,
when this happens more than once, and exponnetial number of
results. Even if the library could handle ambiguity it may not be able
to discover that both results are the same. This problem has already
been described in the development of the permuting combinators
\cite{1030338} and the solution we take here is based on a similar
assumption as we have chosen there: we assume that we are able to
split a parser in a part recognising the non-empty part and one
recognising the possibly empty part. The latter will be represented by
the value that is to be returned as as a witness in case the parser
accepts the empty input (denoted as $\epsilon$ from now on). In order
to make this explicit in our interface we introduce a class:
\begin{code}
class Splittable f where
  getNonPure    :: f a  -> Maybe (f  a)       
  getPure       :: f a  -> Maybe     a
\end{code} 
where we will assume the following equality to hold for any |Alternative| functor |f| we want to use for our basic parsers:
\begin{spec}
f ==  maybe empty id (getNonPure f) <|> maybe empty pure (getPure f)
\end{spec}

We are now ready to define the data type |Gram| representing {\em
  grammars}. It allows us to compute, at each possible
splitting point in the input, the collection of first-parsers that may try to
continue at that point. We start out by
defining a data type |Alt| which has a constructor |Seq| which
explicitly represents the splitting of the grammar into in a first-parser
and ``the rest of the work to be done''. Since we also want
our grammars to have a monadic interface we equip our data type |Alt| a |Bind|
alternative, which again explcitly contains the first-parser to be run:
%{
%format . = "."
\begin{code}
data Alt  f a  =  forall c .  (f (c -> a))  `Seq`   (      Gram f c) 
               |  forall c .  (f c)         `Bind`  (c ->  Gram f a)
\end{code}

Based on the data type |Alt| we now define the data type |Gram|. It
contains two components: a value of type |[Alt f a]|  which is the alternation (choice) of all non-empty parts jointly accepting a non-empty sequence of symbols, and a |Maybe a|
value representing the value to be returned in case \eps is accepted:

\begin{code}
data Gram f a  = Gram   [Alt f a]    (Maybe a) 
\end{code}
%}
The type parameter |f| corresponds to the conventional, non-interruptible parsers,  which we use as building blocks for our grammars.

\subsection{Defining the various class instances for |Gram|}
We now define the instances for these newly introduced
data types for the classes |Functor|, |Applicative|, |Alternative| and |Monad|.

\subsubsection{|Gram| is a |Functor|} We start with the instances for
|Functor| for both the data type |Gram| and |Alt|. The code is 
straightforward:
\begin{code}

instance Functor f => Functor (Gram f) where
  fmap b2a (Gram l_b m_b) = Gram (map (b2a <$>) l_b) (b2a <$> m_b)

instance Functor f => Functor (Alt f) where
  fmap b2a (f_c2b `Seq`  g_c    )   = ((b2a .) <$> f_c2b) `Seq`   g_c
  fmap b2a (f_c   `Bind` c2g_b  )  = f_c                `Bind`   (\c -> b2a <$> c2g_b c)
\end{code}

Here we have chosen a naming convention which makes the types of the values involved explicitly visible in the program text: |b2a| holds a function value of the type |b -> a|, |m_b|
stands for a value of type |Maybe b|, |l_a| stands for a list of
|Alt f a| values, |f_c2b| is bound to a value of type |f (c -> b)|,
|g_c| is bound to a value of type |Gram f c| and |c2g_b| to a value of
type |c -> Gram b|. Now it is easy to see that |fmap b2a (fc `Bind`
c2g_b)| should result in a value of type |Gram a|, etc. Note that
applying |fmap| maintains the invariant that the first-parser can be easily recognised for each |Alt| intact. We have
chosen to use |<$>| as an alias for |fmap| in this code wherever
possible, since this makes the names chosen even more helpful in the representation of the code.
%$

\subsubsection{|Gram| is an |Applicative| functor}
We want to construct our merging parsers in just the same way as
we construct normal parsers, using the well known |Applicative| and
|Alternative| interfaces \cite{McB07}. Since our data types enforce 
the property that we can easily identify the first-parser to be used, this is where the real work takes place. 
The pattern we follow however is
common, and well known from various process algebras and given
in Figure \ref{fig:appl}.  The first-parsers in the |Seq| and |Bind|
constructs of the left-hand side operand are ``rotated'' out. The hard
work is done by the function |fwdby| which combines the remaining part
of the |Seq| and |Bind| constructs to form the new right-hand side of
the |Seq| and |Bind| construct in the result. The |g_c| and |g_b|
grammars  of a |Seq| are ran returning a value of type |(c, b)|. We modify the
value returned by the first-parser --by applying |uncurry| to it-- to
accept this pair of values instead of getting passed the two arguments
individually.  If the left-hand side parser can recognise the empty
input then also the first-parsers of the right-hand side grammar are
first-parsers of the resulting grammar; they can start to
accept part of the input too. This explains the second component in the
definition of the |Alts f b| in the right-hand side of the |<*>|
definition, where we use the witness value of type |b -> a| to update
the result of the right-hand side parser. The definition of |pure|
speaks for itself: we have no non-empty alternatives, and the parser
can recognise $\epsilon$  with a witness of type |a|.
\begin{figure}
\begin{code}
instance Functor f => Applicative (Gram f) where
  pure a = Gram [] (Just a)
  Gram l_b2a m_b2a  <*> ~g_b@(Gram l_b m_b) 
    =   Gram  (  map (`fwdby` g_b) l_b2a 
                 ++
                 [b2a <$> f_b | Just b2a <- [m_b2a], f_b <- l_b]
              )  (  m_b2a <*> m_b)

fwdby :: Functor f => Alt f (b -> a) -> Gram f b -> Alt f a  
(f_c2b2a   `Seq`   g_c)      `fwdby` g_b =  (uncurry <$> f_c2b2a) `Seq`  ((,) <$> g_c     <*> g_b)
(f_c       `Bind`  c2g_b2a)  `fwdby` g_b =  f_c `Bind` (\c -> c2g_b2a c <*> g_b)
uncurry f (x, y) = f x y

instance  Functor f => Alternative (Gram f) where
  empty                     = Gram [] Nothing
  Gram ps pe <|> Gram qs qe = Gram (ps++qs) (pe <|> qe)
\end{code}

\caption{|Gram| is a member of the |Applicative| and |Alternative| classes.\label{fig:appl}}
\end{figure}

A subtle point is that we had to add an irrefutable pattern match (|~|) to the right-hand side operand of |<*>|, since otherwise the pattern matching creates an endless loop in situations like the definition of |pMany| when |f| is instantiated with some |Gram g|:
\begin{code}
pMany p :: f a ->  f [a]
pMany p =  let  result = (:) <$> p <*> result `opt` [] in   result
\end{code} 
Here |<*>| does not have access to the top-level constructor in its
right-hand side, since this constructor is produced by this very call
to |<*>|. Note that the call to |<*>| is evaluated lazily, so we have
no problems with recursive grammars. The unrolling of these
definitions is done on a by-demand basis during the actual parsing
process.  This technique resembles the parallel parsing strategy as
developed by Claessen \cite{JFP:254719} and code also bears close
resemblance to the computation of the |firsts| set, as described by
Swierstra and Duponcheel \cite{SwieDupo96}.

In case the left-hand side is a monadic construct we just use the
original first-parser |f_c|, and again move the right-hand side
|c2g_b2a| of the left operator to the right-hand side of the result,
where it is composed with the original right hand side using |<*>|.

\subsubsection{|Gram| is an |Alternative| functor} 
The instance for |Alternative| is almost trivial: we concatenate the
list of alternatives from both operands. This leaves the question what
to do if the grammar is ambiguous, caused by both alternatives to be
able to accept \eps. We have chosen to use the left-biasedness of
|<*>| as defined for |Maybe| to choose the left value to return. The
code is again given in figure \ref{fig:appl}.

\subsubsection{|Gram| is a  |Monad|}
The next thing we want to do is to equip our grammars with a monadic
interface, which we again achieve by ``rotating'' all but the first-parser to the right so the first-parser again is presented at the top level constructor. In the
case of a |`Seq`| construct as the left argument of |>>=| we split its
monadic effect into two steps: in the first step we make the
first part if the left-hand side operand explicitly visible
in the resulting |`Bind`| construct, whereas the corresponding right-hand side |g_c|
of the |`Seq`| part is, once it has been combined using |\c2b -> c2b
<$> g_c | with the result |c2b| of the left-hand side of the |`Seq`|
in another call to |>>=| in the right-hand side of the resulting top
level |`Bind`| constructs.

When the left-hand side accepts \eps we have to take some
extra precautions, since in this case the first-parsers created by a
call to right-hand side compete for input too, since they too may
accept input at this point. Since we have the witness of this
empty left-hand side available, we can use it to compute the |Gram a|
value returned by the right-hand side of |>>=|, and with this its first-parsers become available too and 
can take part in the competition:
\begin{code}
instance  Functor f =>  Monad (Gram f) where
  return a = Gram [] (Just a)
  Gram l_b m_b >>= b2g_a = case m_b of
        Nothing  ->  Gram (map (`bindto` b2g_a) l_b) Nothing
        Just b   ->  let Gram l_a m_a =  b2g_a b
                     in  Gram ( map (`bindto` b2g_a) l_b ++ l_a) m_a

bindto :: Functor f => Alt f b -> (b -> Gram f a) -> Alt f a 
(f_c2b  `Seq`  g_c)    `bindto` b2g_a = f_c2b  `Bind` \ c2b  -> c2b <$> g_c   >>= b2g_a
(f_c    `Bind` c2g_b)  `bindto` b2g_a = f_c    `Bind` \ c    -> c2g_b c       >>= b2g_a

\end{code}
 

\subsubsection{Constructing elementary |Gram| values}
The last thing we have to do is to show how we can lift a parser into an equivalent |Gram|
\begin{code}
mkG::  (Splittable f, Functor f) => f a -> Gram f a
mkG p =  Gram   (maybe [] (\p -> [(const <$> p) `Seq` pure ()]) (getNonPure p)) 
                (getPure p)
\end{code}

At first sight this code looks more complicated than strictly needed; this
is caused by our choice of the |Alt| data type. We could have easily
added a third case |Single (f a)| to this data type, and have used
this |Single| constructor here. We have chosen for the current, minimalistic
approach, in which this |Single| data type is encoded as a parser
which is to be followed by an \eps parser returning |()| the result of which is
subsequently discarded. This adds a small constant-time overhead to
our parsers, which we think is acceptable in return for the increased
simplicity of the code.

\subsection{|<||||>| and |<<||||>|}
In the previous subsections we have defined the data type |Gram f a|,
have shown how to lift elementary parsers to this structure, and
have defined instances for this type for the |Functor|,
|Applicative|, |Alternative| and |Monad| classes. As a final step we
now define the |<||||>| combinator, which describes the ``interleaved''
composition of two grammars. We will however express this combinator in
terms of an even more primitive combinator |<<||||>|:
\begin{code}
infixl 4  <||>
infixl 4  <<||> 

(<||>) , (<<||>) :: Gram (b->a) -> Gram b -> Gram a
\end{code}
Note that we have given these operators the same type as the
conventional |<*>| combinator, since we very much like the applicative
interface for describing how to combine the two accepted values into the
result. The reason, of course, that we cannot use the |<*>|
combinator is that it has already been used to express the more
conventional sequential composition of two |Gram| values.

The idea of the |<<||||>| combinator is that it will run one of the first-parsers of its left-hand side operand on the input first, and from that point
on will behave like |<||||>|, which does not have a preference for
either of its operands to start accepting input. We can easily define
|<||||>| in terms of |<<||||>|:
\begin{code}
g_b2a <||> g_b =  ^^    ^^            g_b2a   <<||>  g_b 
                  <|>  flip ($) <$>   g_b     <<||>  g_b2a
\end{code}
Here we see that the resulting parser will either run one of the first-parsers from its left-hand side operand or one of the first-parsers of its right-hand side operand. In case  both grammars can
accept \eps we get the same witness value twice, and in principle our
grammar becomes ambiguous; the biased choice of the |Maybe| instance
of |Alternative| throws away one of these two (equal) values.

So all we have to do now is to define |<<||||>|. We construct a new grammar which has as its first-parsers all the first-parsers of its left-hand side operand: 
\begin{code}
(<<||>):: Functor f => Gram f (b->a) -> Gram f b -> Gram f a
g_b2a@(Gram l_b2a m_b2a) <<||> ~g_b@(Gram _ m_b)
   = Gram  (map (`fwdby'` g_b) l_b2a)  (m_b2a <*> m_b)

(f_c2b2a  `Seq`   g_c)  `fwdby'` g_b  =  (uncurry <$> f_c2b2a) `Seq` ((,)  <$> g_c <||> g_b)
(f_c `Bind`  c2g_b2a)   `fwdby'` g_b  =  fc  `Bind` (\ c -> c2g_b2a c <||> g_b)  
\end{code}
Notice that this code is almost the same as that for the definition of
|<*>|; only have the occurrences of |<*>| in the right-hand sides of
the |fwdby| function been replaced by |<||||>|, thus indicating that
thus constructed parsers should run interleaved instead of
sequentially.

\subsection{Converting Grammars into Parsers}
The only thing left to do now it to show how to construct a real parser from
a |Gram| structure. We will require that the parameter |f| we have
carried around thus far has instances for the usual |Applicative|, |Alternative| and
|Monad| interfaces, so we can use the functions available from these
classes in this process. For each of the alternatives we select the
first-parser from it, use |<||>| to select one of these to run, and
after that either combine its result using |<*>| with the parser
generated from the corresponding |Gram| value in the case of a
|`Seq`|, or use it as an argument to the right-hand side operand in
the case of a |`Bind`| and convert this result again into a proper
parser.
\begin{code}
mkP :: (Monad f, Applicative f, Alternative f) => Gram f a -> f a
mkP (Gram l_a m_a) = foldr  (<|>) (maybe empty pure m_a) 
                                  (map mkP_Alt l_a)
   where  mkP_Alt (f_b2a  `Seq`  g_b  )   = f_b2a  <*>    mkP g_b
          mkPrAlt (f_b    `Bind` b2g_a)   = f_b    >>=   (mkP . b2g_a)
\end{code}

\subsection{Inserting separators}
As we have seen in the |pCart| example it is a common case that the
elements which we want to recognise, and which occur in a permuted
order are separated by e.g. a |';'| or a |','|. For these cases we
have introduced a special version of |mkP|, which takes an additional
argument telling how to parse a separator. The hard work is done by a
function |insertSep| which prefixes each parser, except the first one, in the |Gram|
parameter by the parser that recognises the separator:
\begin{code}
sepBy :: Applicative f =>  Gram f a -> f b -> f a
sepBy  g sep = mkP (insertSep sep g)

insertSep :: (Applicative f) => f b -> Gram f a -> Gram f a
insertSep sep (Gram l_a m_a :: Gram f a) = Gram (map insertSepInAlt l_a) m_a
   where  
    insertSepInAlt (f_b2a `Seq` g_b   ) = f_b2a `Seq` prefixSepInGram g_b
    insertSepInAlt (f_b   `Bind` b2g_a) = f_b `Bind` (insertSep sep . b2g_a)
    prefixSepInGram (Gram l_a m_a) = Gram (map prefixSepInAlt l_a) m_a
    prefixSepInAlt ::  Alt f b -> Alt f b
    prefixSepInAlt (f_b2a `Seq` g_b)   = (sep *> f_b2a) `Seq` prefixSepInGram  g_b
\end{code}

Because we are making use of polymorphic recursion we had to insert a few type annotations in the code.
\subsection{Parsing merged lists}
Although the combinators follow the common interfaces, there are a few
tricky points one has to keep in mind when using them. The fact that
the interleaved parsers compete for input may lead to some complications one should
be aware of. We take a look at the traditional definition of |pMany|,
which converts a parser into a parser which recognises a list of
elements recognised by its argument parser:
\begin{code}
pList p = let pmp = (:) <$> p <*> pmp `opt` [] in pmp
\end{code}
In this definition the recursive call to |pmp| only starts to play a
role once the first instance of |p| has succeeded. If we however
replace the |<*>| operator by a |<||||>| operator, then the recursive
|pmp| can start to parse immediately too, and will spawn yet
another instance of |p| which starts to compete for the input  and so
on recursively; apparently changing sequential execution by interleaved
execution has deeper implications than is directly visible from the
code. Fortunately this problem can be solved rather easily: we
decide to only start with a new instance of |pmp| competing for the
input, once |p| has started its work and has comsumed a bit of input. Hence we define:
\begin{code}
gmList p =let pmp = (:) <$> p <<||> pmp `opt` [] in pmp
\end{code}
We see here that the availability of |<<||||>| plays an essential role in this
definition; in that sense it is more primitive than
|<||||>|, which was expressed in terms of |<<||||>|.

\section{Applications}\label{sec:options}
In this section we will give two examples of the use of the introduced merging combinators.
\subsection{Parsing options}
One of the most boring tasks in writing an application is the
processing of the options passed on the command line. Although there
are some packages and tools to make one's life a bit easier, there
always remains a lot of work to be done. Usually conversion from the
strings which were passed to the kind of values one is really
interested in has to be done explicitly, for optional arguments
defaults have to be provided, for required arguments we have to check
that they have actually been provided, and there are many conventions
for passing the options, be it in short form as in @ls -l@, in long
form as in @haddock --enable-documentation@, in a kind of key-value
pair as in @process -o outputfile@ or @process -o=outputfile@, etc.
We will now present a small collection of combinators which completely
takes away this burden from the programmer, using the introduced
merging parser combinators.

We start out by assuming that in our program we want to put our recognised
options in a record with named fields. Using Template Haskell we generate lenses
to give us access to the individual fields. As an example we define
the following data types and example record:
\begin{code}
import Data.Lenses
import Data.Lenses.Template
data Prefers = Clean | Haskell deriving Show
data Address  =  Address {city_ :: String, room_ :: String} 
                 deriving Show
data Name     =  Name  {  name_:: String, prefers_:: Prefers, ints_ :: [Int]
                       ,  address_ :: Address} 
                 deriving Show

$(deriveLenses ''Name)
$(deriveLenses ''Address)
$(deriveLenses ''Prefers)
defaults = Name "Doaitse" Haskell [] (Address "Utrecht" "BBL517")
\end{code}
The |deriveLenses| calls to template Haskell generate code which will
give us read and write access to the fields which a name which ends in
an |'_'|. What is precisely generated does not matter too much here,
but what is important is that the imported packages provide amongst
others a function |alter| which can be used to update a field of type
|a| pointed at by the first parameter in a record of type |r| by
applying the passed function of type |a -> a| to it:
\begin{code}
alter :: MonadState a m => (m () -> StateT r Identity b)  ->  (a -> a) 
                                                          ->  (r -> r)
\end{code}
Now we can update the record at say the field |prefers| as follows:
\begin{examplee}
print ((prefers `alter` (const Clean)) defaults)
Name {  name_ = "Doaitse", prefers_ = Clean, ints_ = [], 
        address_ = Address {city_ = "Utrecht", room_ = "BBL517"}}
\end{examplee}

So the important thing to remember is that an expression |a `alter` f| applies the function |f| to a field pointed at by the lens |a|.

The first thing we do is to define a function |oG| (|optionGrammar|),
which takes a normal parser |p| which parses a single option and
modifies the parser such thatits result is  applied  to the field
pointed at by the lens:
\begin{code}
oG   p  a = mkG ((a `alter`) <$> p)
\end{code} 
Using this code we can define an option parser which recognises a required option that takes a single extra parameter, such as e.g. filename. 
\begin{code}
required_ a   (string, p) 
            =        oG ( pSymbol ("-"   ++ [head string])   *> lexeme p) a
                <|>  oG ( pSymbol ("--"  ++ string ++ " ")   *> lexeme p) a
                <|>  oG ( pSymbol ("--"  ++ string ++ "=")   *> lexeme p) a 
required a   (string, p)  = required_ a (string, const <$> p)
\end{code}
The call |inp `required` ("filename", pFileName)| will construct a grammar
which is able to recognise one occurrence of the three possible forms
of passing an option: @-f inputfile@, @--filename inputfile@ and
@--filename=inputfile@, and will update the field with name |inp| in
the record which will hold our recognised options. The parser
|pFilename| recognises the file name part of the option.


Using this basic parser for a single option we can now define special
versions of it. The function |option| makes the required field optional,
as its name suggests. The function |flag| recognises an option which
does not read an extra argument from the input, but just sets the field to the passed value:
\begin{code}
option   a   string_p     =  required  a string_p `opt` id
flag     a   (string, v)  =  option    a (string, pure v)
flags    a   table        =  foldr (<>) (pure id) (map (flag a) table)
\end{code}
At this point one may say that the code we have presented thus far
does not really depend on the fact that we have introduced grammars,
and could have been implemented using the permutation parsers which
have been available for a long time (see e.g. the
@options-applicative@ or the @uu-parsinglib@ packages on hackage). We
now come to the point where our somewhat more involved combinators
will start to pay of. In the example record we see that we have a
field which holds a list of integers, and wouldn't it be nice if these
integers could each be specified by a separate item in the list of
options? For this we define the functions:
\begin{code}
options   a (string, p)  =  pFoldr ( (.), id) (required_ a (string, (:) <$> p) 
optionsl  a string_p     =  (last .)  <$> options a string_p 
optionsf  a string_p     =  (head .)  <$> options a string_p 
\end{code}
Use of each of these functions will make that several settings,
distributed over the total list of options, will be recognised and
combined into a value for a single field, to which they will be
prepended. The functions |optionsl| and |optionsf| select the last,
respectively the first element of this list. They can be used in the
case where the same option may be set several times and we want to
use the last setting or the first one. This may come in handy when the
total list of options consists of e.g. a sequence of options as typed
on the command line concatenated with a list of options taken from some
preferences file.

Finally we want to be able to define the options for the |Address|
field which is part of our |Name| record holding the recognised
options. Using lenses this is easy. We recognise a
list of options, and apply these to field of the parent
record pointed to by the passed lens:
\begin{code}
field s opts = (s `alter`) <$> opts
\end{code}

So now we are finally ready to show how the specification of the
option parser for our |name| record looks like (note that the order
in which we specify the options does not matter):
\begin{code}
instance Functor f => Monoid (Gram f (r -> r)) where
  mappend p q = (.) <$> p <||> q
  mempty      = empty

flags table a = foldr (<>) (pure id) [flag text val a | (text, val) <- table]
 
oName    =  ^^   name     `option`   ("name",  pString)
            <>   ints     `options`  ("int",   pNatural) 
            <>   prefers  `flags`    [("clean",    Clean)
                                     ,("haskell",  Haskell)] 
            <>   address  `field`    (   city   `option`  ("city",  pString)  
                                     <>  room   `option`  ("room",  pString)
                                     )
\end{code}
By making values of type |Gram f (r -> r)| an instance of the class
|Monoid|, by defining |mappend| as the merge of the two parameters, and
composing the record updating functions returned by both parameters
when used as parser, we can use the nice |<>| notation to combine the
options. We finally run our options:
\begin{examplee}
run (($ defaults) <$> mkP oName)  
    "--name=Rinus --int=7 --city=Nijmegen  -i  5 --clean -i3"
--  Result: Name   {    name_     =  "Rinus"
                   ,  prefers_ = Clean
                   ,  ints_     =  [7,5,3] 
                   ,  address_  =  Address  {  street_  = "Nijmegen"
                                          ,  room_    = "BBL517"}}
\end{examplee}
Note that the specifications for the nested |Address| field appear
distributed among the other options, and that all the integers we have
specified end up together in the |ints| field.

In the definition of the derived combinators used for specifying
specific variants of options we have chosen to use lenses, which return
a function which updates an already existing structure containing the
values to be collected. The advantage of this approach is that we may
start with a record containing the default values and apply the
result of reading a preferences file to it, and next apply on
top of that the extra information parsed from the command line. Note
that each of these structures can be easily parsed using the newly
introduced combinators, be it using parsers described in an
applicative style or using lenses, which correspond more closely to the familiar
keyword-value way of specifying parameters.

\subsection{Nested options}
It is not unfamiliar to pass options for a linker that is to be
involved in a later stage on the command line of a compiler. One of
the problems which may arise from such an option architecture is that
the option specifications of otherwise rather independent programs may
start to interfere; is e.g. the @--verbose@ option passed to a module
installer like |cabal| meant for the installer itself, or for the
|haddock| program that is used to generate the associated
documentation? This problem can be easily solved by requiring these
options to be surrounded by @+haddock@ and @-haddock@ markers on the
command line, and specifying the command line parser as follows:
\begin{code}
pHaddock = (gList. mkG)   (    pToken "+haddock" 
                          *>   mkP( <haddock options>) 
                          <*   pToken "-haddock"
                          )
\end{code}
\section{Conclusions}\label{sec:conclusions}

We have derived a set of very general combinators which make it
possible to unravel merged input structures. The library imposes very
few restrictions on the underlying parser combinators used. A
distinguishing feature of our combinators is that they extend beyond
the now common parsers used for permuted structures. Our new combinators
have both a monadic and an applicative interface. By being able to
switch between the sequential and the merging view on the input we can
recognise permuted structures which are embedded inside other permuted
structures. In this  way we may specify options for various different
subsequent program stages, where the option structures for these
programs would otherwise conflict.

In deriving the library we found it very helpful to choose our
identifiers in such a way that the types are explicitly represented in
the chosen identifiers. As we will see his greatly helps in
identifying the way we can construct the needed values out of the
available values. Choosing the identifiers in the way we did greatly
helped us in writing the code, and is an essential ingredient in
applying the programming paradigm in which we {\em ``Let the types do
  the work''}. Our experience in programming in this way helped us so
much, how trivial this observation may seem, that we think it deserves
being pointed at explicitly rather than just being used in the
construction of this library.  \bibliographystyle{splncs03}

\section{Acknowledgement} We want to thank Bastiaan Heeren and members of the Software Technology reading club for
commenting on earlier versions of this paper.

\begin{flushleft}
%\bibliography{/Users/doaitse/InProgress/BibTeX/biblio}
\begin{thebibliography}{1}
\providecommand{\url}[1]{\texttt{#1}}
\providecommand{\urlprefix}{URL }

\bibitem{1030338}
Baars, A.I., L{\"o}h, A., Swierstra, S.D.: Parsing permutation phrases. J.
  Funct. Program.  14(6),  635--646 (2004)

\bibitem{JFP:254719}
Claessen, K.: {FUNCTIONAL PEARL} {P}arallel {P}arsing {P}rocesses. Journal of
  Functional Programming  14(6),  741--757 (2004)

\bibitem{Guerra:2005fk}
Guerra, R., Baars, A.I., Swierstra, S.D., Saraiva, J.: Preserving order in
  non-order preserving parsers. Tech. Rep. UU-CS-2005-025, Department of
  Information and Computing Sciences, Utrecht University (2005)

\bibitem{McB07}
McBride, C., Paterson, R.: Applicative programming with effects. Journal of
  Functional Programming  18(01),  1--13 (2007),
  \url{http://portal.acm.org/citation.cfm?id=1348940.1348941}

\bibitem{SwieDupo96}
Swierstra, S.D., Duponcheel, L.: Deterministic, error-correcting combinator
  parsers. In: Launchbury, J., Meijer, E., Sheard, T. (eds.) Advanced
  Functional Programming. LNCS-Tutorial, vol. 1129, pp. 184--207.
  Springer-Verlag (1996)

\end{thebibliography}
\end{flushleft}
\end{document}