\documentclass[preprint,authoryear]{sigplanconf} %include lhs2TeX.fmt %include polycode.fmt \usepackage{amsmath} \usepackage{mathptmx} \usepackage[T1]{fontenc} \usepackage{float} \usepackage{relsize} \usepackage{color} %include format.lhs \newcommand{\doaitse}[1]{\marginpar{\tiny{\color{red}{SDS: #1}}}} \newcommand{\amiddelk}[1]{\marginpar{\tiny{\color{red}{AM: #1}}}} \floatstyle{boxed} \restylefloat{figure} \hyphenation{Ma-ga-lh\~{a}es} \begin{document} \conferenceinfo{HS '12}{September 13, Copenhagen.} \copyrightyear{2012} \copyrightdata{[to be supplied]} %% \titlebanner{Look and see what is possible with UUAGC!} % These are ignored unless \preprintfooter{Challenges and solutions when using Haskell with attribute grammars.} % 'preprint' option specified. \title{Merging Idiomatic Haskell with Attribute Grammars} \authorinfo{Arie~Middelkoop} {LIP6-REGAL} {adriaan.middelkoop@@lip6.fr} \authorinfo{Jeroen~Bransen \and Atze~Dijkstra \and S.~Doaitse~Swierstra} {Universiteit Utrecht} {\{J.Bransen,atze,doaitse\}@@uu.nl} \maketitle \begin{abstract} Attribute grammars with embedded Haskell code form an expressive domain specific language for tree traversals. However, the integration with some prominent Haskell features poses challenges: conventional attribute grammars cannot be written for all Haskell data types, lazy evaluation may lead to space leaks, and the embedded code may not be monadic. This paper investigates these challenges, and presents solutions using some general extensions to attribute grammars. \end{abstract} \category{D.3.4}{Processors}{Preprocessors} \terms{Algorithms, Languages} \keywords{Attribute grammars, Composition, Monads} \section{Introduction} In functional programming, attribute grammars can be seen as a declarative and compositional domain specific language for tree traversals, in particular those that can be written as a fold or catamorphism~\citep{DBLP:conf/fpca/MeijerFP91}. Because of the correspondence between grammars and algebraic data types, an attribute grammar describes such a traversal as a collection of rules between attributes of connected nodes of the tree, leaving the fact that we describe a catamorphim implicit. Haskell is known to be an excellent implementation language for attribute grammars, since lazy evaluation provides a natural infrastructure for evaluating attribute grammars and advanced type system features even allows them to be expressed as a an embedded domain specific language ~\citep{DBLP:conf/icfp/VieraSS09, UUCS2011029}. In addition, rules can be given as pure embedded Haskell code~\citep{DBLP:conf/ifip2-4/SwierstraA98} thus adding the expressiveness of Haskell to attribute grammars. Our experiences with attribute grammars in the large Haskell project UHC~\citep{uhc} confirm that Haskell is an excellent host language. However, over the years we also ran into a number of obstacles: \begin{itemize} \item Lazy evaluation is a double-edged sword. The translation of attribute grammars to Haskell results in so-called circular Haskell code which is difficult to optimize. This code easily exhibits space leaks which are troublesome when dealing with large trees. This problem can be remedied by producing acyclic Haskell code~\cite{Bransen:2012:KAR:2187125.2187139}, imposing however (mild) restrictions on the grammar. In particular, the grammar is not allowed to be cyclic, which makes it hard to express fix-point computations within the grammar. \item Haskell data types, which may be parametrized over the types of their fields, are more expressive than context-free grammars. We can for example not define a context-free grammar for a data type with |alpha| as type parameter where a field of type |alpha| is represented as a nonterminal. Consequently, these fields cannot be traversed. \item The order of evaluation is completely left implicit in an attribute grammar, hence it is unclear how to integrate monadic attribute definitions, as the order of evaluation may be relevant for the result. \end{itemize} There is a need to solve these obstacles: lazy evaluation, polymorphism, and monads are prominent Haskell features that would be useful in combination with attribute grammars. As potential use-case, consider Xmonad layout combinators expressed by an attribute grammar that may tile windows either horizontally or vertically, depending on the best fit. This problem is closely related to pretty printing~\citep{Swierstra:2009:LBF:1520298.1520299}. A solution for such a problem features monadic operations to query window properties such as hints, and may be parametrized over the type of state used by user hooks. We need solutions for integrating such features with attribute grammars, because these are not only obstacles for the application of attribute grammars, but also an obstacle for their adoption by Haskell programmers, who expect to be able to use such features. This paper makes two kinds of contributions: we show how to integrate these Haskell features with attribute grammars, and we present some novel attribute grammar extensions that form essential ingredients. Specifically, we give a short introduction to attribute grammars (Section~\ref{sect.tutorial}), show the translation of attribute grammars to cyclic and acyclic code (Section~\ref{sect.codegen}) and explain some common attribute grammar extensions (Section~\ref{sect.common}). We present the following novel attribute grammar extensions: \begin{itemize} \item Section~\ref{sect.eager} presents eager rules, which allows local assumptions to be made about the evaluation order of rules. \item Section~\ref{sect.ioc} presents hooks into the attribute evaluator so that Haskell code can control the evaluator of a node (e.g. apply it repeatedly, or change attribute values), about which normally no assumptions can be made in an attribute grammar. \end{itemize} We show that with these extensions we can integrate the aforementioned Haskell features: \begin{itemize} \item Section~\ref{sect.typeparams1} deals with data types and attributes that parameterized over types. Section~\ref{sect.typeclasses} deals with the integration of type classes. Finally, Section~\ref{sect.typeparams2} shows how to represent the fields of a datatype as nonterminal that have a type that is a parameter of the data type. \item Section~\ref{sect.monads} introduces monadic rules which allows the combination of the composition mechanism of attribute grammars and monads to be combined. \item The above solutions require a static evaluation order. Section~\ref{sect.ioc} offers a solution for bypassing some of the restrictions; in particular for dealing with iteration and fixpoint computations. \end{itemize} %% \amiddelk{It would be nice to have some great closing sentence here} %if False The paper deals with several problems, and presents several solutions that build on top of each other: \begin{itemize} \item Section~\ref{sect.tutorial} gives a short introduction to attribute grammars. Some familiarity with attribute grammars is assumed; \citet{wouterag} describes the essentials. \item Section~\ref{sect.codegen} shows the translation of attribute grammars to cyclic code, and points out the problems stemming from laziness. It also presents a solution in the form of a translation to acyclic code, and its corresponding limitations. The key ingredient is deriving a static evaluation order of attributes and rules, which plays an essential role in later sections. \item Section~\ref{sect.common} explains some common attribute grammar extensions. In particular, we use the higher-order attribute grammar extension throughout this paper. \item Section~\ref{sect.typeparams1} deals with the first integration problem: data types and attributes parameterized over types. We defer dealing with type variables as nonterminals to Section~\ref{sect.typeparams2} because it depends on Section~\ref{sect.typeclasses} and Section~\ref{sect.proxy}. \item Section~\ref{sect.eager} presents a novel attribute grammar extension for prioritizing certain rules. \item Section~\ref{sect.typeclasses} deals with the integration of type classes. In the proposed solution dictionaries are passed via inherited attributes. Eager rules that unpack the dictionaries emulate the implicit parameter passing associated with the use type classes. \item Section~\ref{sect.proxy} presents a pattern for adding additional structure to the tree, which is used by later sections. \item Section~\ref{sect.typeparams2} shows that transforming the tree to a different representation can be used to bypass the problems with type variables as a nonterminal in the right hand side of a production. \item Section~\ref{sect.monads} introduces monadic rules which allows the combination of the composition mechanism of attribute grammars and monads to be combined. \item Section~\ref{sect.ioc} shows an expressive extension for hooking into the attribute evaluator. It offers an implementation for monadic rules, and can be used to express iteration. The approach relies on a static evaluation order, but at the same time offers a solution for bypassing its limitations. \end{itemize} %endif %% The two attribute grammar extensions that crosscut these sections are %% higher-order attribute grammars and absolutely noncircular attribute grammars. \section{Attribute Grammar Tutorial: Repmin} \label{sect.tutorial} Figure~\ref{fig.repmin} shows Repmin~\citep{DBLP:journals/acta/Bird84}, a typical example of an attribute grammar. Before we explain the example, we first discuss the three important syntactic elements in the figure. \paragraph{Syntax.} The |data| declarations resemble Haskell data declarations. They describe the context-free grammar: type constructors are nonterminals, value constructors are productions, and constructor fields are symbols in the right hand side of productions. An instance of such a type is a tree where each node was produced by a constructor. The fields of the constructors are explicitly named and come in two variations: terminal symbols (with a double colon) and nonterminal symbols (with a single colon). Two categories of attributes can be declared for a nonterminal: inherited attributes (declared with |inh|) of a child are defined by the parent and can be used by children; synthesized attributes (declared with |syn|) are defined by children and can be used by the parent. A chained attribute (declared with |chn|, not used in the example) is syntactic sugar for both an inherited and synthesized attribute with the same name. Rules define how an attribute is to be computed in terms of other attributes. A semantics block (starting with |sem|) specifies a collection of rules per production. A rule has the form |p = e| where |p| is a pattern defining attributes, and |e| is a Haskell expression that may refer to an attribute or terminal by prefixing it with the |@| symbol. We refer to an attribute using the notation |c.a| where |a| is the name of the attribute, and |c| is either the name of a child or the keyword |lhs| (left hand symbol) when referring to an attribute of the parent. Attribute grammars thus offer a programming model where each node is decorated with attributes, and rules specify their computation, implying data dependencies between attributes as a consequence. \emph{Attribute evaluation} consists of computing values for attributes in an evaluator order imposed by the data dependencies. \begin{figure}[t] \begin{code} data Root | Top ^^^ top : Tree data Tree | Bin ^^^ left : Tree ^^^ right : Tree | Leaf val :: Int attr Tree ^^^ inh gmin :: Int ^^^ inh lmin :: Int ^^^ syn repl :: Tree attr Root syn repl :: Tree sem Tree | Leaf ^^^ lhs.lmin = @val lhs.repl = Leaf @lhs.gmin | Bin lhs.lmin = @left.lmin `min` @right.lmin lhs.repl = Bin @left.repl @right.repl left.gmin = @lhs.gmin left.gmin = @lhs.gmin sem Root | Top ^^^ top.gmin = @top.lmin lhs.repl = @top.repl \end{code} \caption{Common example of an attribute grammar: Repmin} \label{fig.repmin} \end{figure} \paragraph{Example.} Figure~\ref{fig.repmin} shows how to compute a transformed tree as synthesized attribute |res| where the value in each leaf is replaced by the minimal value in the original tree. The synthesized attribute |lmin| represents the local minimum of the tree, and the inherited attribute |gmin| the global minimum. The local minimum is propagated upwards and passed down as global minimum at the root. \paragraph{Advantages.} Attribute grammars offers several advantages compared to writing Haskell functions by hand: \begin{itemize} \item The rules and attributes can be given in any order. \item The navigation over the tree structure is implicit in comparison to e.g. zippers. \end{itemize} These two advantages allow an attribute grammar to be composed out of several individual fragments, which facilitates separation of concerns. \paragraph{Roots.} In the example |Root| is a start symbol of the grammar. Start symbols must be declared explicitly, but in the paper we shall assume that this is clear from the context. There is a semantic difference that we refer to later in this paper: the root needs only to define the inherited attributes of the children that are dependencies of the synthesized attributes of the root. %% a rule %% that defines an inherited attribute of a direct child of the root may be %% omitted if it is statically not needed to compute the %% synthesized attributes of the root. %% \paragraph{Note.} The Repmin example embodies a common pattern in %% attributes grammars: to compute some property of some subtree and %% then passing the result down for use by other computations. \section{Evaluation Algorithm} \label{sect.codegen} This sections shows two translations to Haskell. They both serve two goals: to provide the reader with a better understanding of the semantics (which we do not specify formally), and to prepare for later sections on the various grammar extensions and their implementation. \subsection{Translation to Lazy Haskell Code} \label{sect.lazy} In \citet{DBLP:conf/ifip2-4/SwierstraA98} a translation to lazy Haskell code is presented. The translation uses catamorphisms to map each node of the tree to its \emph{evaluator}, which is a function that takes values of its inherited attributes as its arguments and produces the values of its synthesized attributes as its results. For the Repmin example of the previous section, the evaluator is thus a function that takes |gmin| and produces |lmin| and |repl|. \begin{figure}[t] \begin{code} sem_Tree (Leaf val) = sem_Leaf val sem_Tree (Bin left right) = sem_Bin (sem_Tree left) (sem_Tree right) sem_Root (Top top) = sem_Top (sem_Tree top) sem_Leaf val = \lhs_gmin -> let lhs_lmin = val lhs_repl = Leaf lhs_gmin in (lhs_lmin, lhs_repl) sem_Bin left right = \lhs_gmin -> let (left_lmin, left_repl) = left lhs_gmin (right_lmin, right_repl) = right lhs_gmin lhs_lmin = left_lmin `min` right_lmin lhs_repl = Bin left_repl right_repl in (lhs_lmin, lhs_repl) sem_Top top = let (top_lmin, top_repl) = top top_gmin top_gmin = top_lmin lhs_repl = top_repl in lhs_repl \end{code} \caption{Translation of Repmin to lazy Haskell code.} \label{fig.lazyrepmin} \end{figure} Figure~\ref{fig.lazyrepmin} shows the catamorphisms for |Tree| and |Root|, and the \emph{semantic functions} that comprise the algebra, one for each production. A semantic function takes the evaluators for the children of a node as parameter and produces the evaluator for itself. The body of the function consists of calls to the evaluators of the children, and their inputs and outputs are tied together by straightforward translations of rules. \paragraph{Cyclicity.} Note that |sem_Top| has a cyclic definition because the argument |top_gmin| to function |top| depends on the result |top_lmin| of |top|. This is not a problem because the runtime data dependencies are acyclic: |top_lmin| can be computed without needing |top_gmin|. Lazy evaluation provides the appropriate attribute scheduling. However, this requires that the function is not strict in its arguments, reducing the opportunity for optimizations. %% Mention the advantages of the lazy approach: e.g. the ability %% to encode iteration, DAGs. Then mention issues such as %% accidental loops and memory usage. \subsection{Translation to Acyclic Haskell Code} \label{sect.ordered} It is possible to avoid the cyclic definition in Figure~\ref{fig.lazyrepmin}. In \citet{Bransen:2012:KAR:2187125.2187139} we describe our approach in detail; here we give an informal description. The definition in Figure~\ref{fig.repmin} is cyclic because the evaluator of the root needs to pass |gmin| to |top| before it gets |lmin|. However, what if the evaluator does not evaluate a tree in one (lazy) step, but as a sequence of smaller steps? If the evaluator of |top| can produce |lmin| in the first step, and only in the second step would need |gmin| to produce |repl|, then the definition is no longer cyclic! \paragraph{Absolutely noncircular.} The evaluator of a node receives the evaluators of its children as parameter. It thus does not know the actual tree structure of the children. The above idea works thus only if for all possible trees the attributes of the children have acyclic dependencies. The grammar is \emph{absolutely noncircular} if this is the case, which can be verified using a static analysis given by \citet{DBLP:journals/mst/Knuth68}. The rules of a production define the data dependencies between the attributes of a node and attributes of the children of the node. To obtain a static approximation of the attribute dependencies of a tree of some type |N| with the root node produced by production |P| of |N| but with arbitrary trees of the appropriate types as children, we plug the approximations of their dependencies into the dependencies given by |P|. The approximation of an arbitrary tree of type |M| is the union of the above approximations for each production of |M|. The fixpoint of these cyclic equations is the result of the static analysis. \paragraph{Scheduling.} Each step of the evaluator is called a \emph{visit}, which produces some of the synthesized attributes given the appropriate inherited attributes. For each visit, the evaluator of some node needs an algorithm that comprises visits to children and the evaluation of rules that compute the appropriate inherited attributes of children and the synthesized attributes of the node itself. Note that an evaluator is defined for each nonterminal, and that a nonterminal symbol can occur more than once in the right-hand side of a production. Each occurrence may have different demands regarding the order in which the attributes are to be computed. Hence, an evaluator may need to support multiple sequences of visits. The process of determining the sequences of visits and the attributes that are computed is called \emph{attribute scheduling}. The basis for the scheduling algorithm that we employ is given by \citet{DBLP:conf/popl/KennedyW76}. We start at the root nonterminal of the grammar, for which we require a single visit that comprises all attributes. To construct an algorithm for a given visit, we know the state of the tree prior to the visit. We obtain the schedule backwards by chasing the data dependencies between attributes and rules in such a way that we first have chased all the dependents (for this visit) before proceeding with their dependencies. Rules are simply added to the schedule. For synthesized attributes of children we have to do more work: given the dependencies, we identify the required inherited attributes and have thus identified another visit for the child. Our actual scheduling algorithm is a refinement of the above scheme. It additionally ensures that that the order in which the dependencies are chased does not influence the schedule. It also encourages independent visits to children, which are thus candidates to be executed in parallel. \paragraph{Representation.} The evaluator is represented as a closure that contains the attributes of the node and the evaluators of the children. To apply it, the caller specifies which of the possible visits to use and the needed values of the corresponding inherited attributes. The caller also specifies a continuation, which is given an updated closure and the requested values of the corresponding synthesized attributes. The encoding is rather complex, but features the nice property that it is purely functional, strongly typed, and strongly normalizing if the rules do so too. Figure~\ref{fig.acyclic} gives an impression of how such a function looks like (representing a single visit sequence of |Bin|). In the figure, |left'| and |right'| are the updated closures of |left| and |right| respectively, |k| is the continuation after the first visit, and |k'| the continuation after the second visit, and |closure| is the updated closure of the node itself. \begin{figure}[t] \begin{code} sem_Bin left right = \k -> left ^^ $ ^^ \left_lmin left' -> right ^^ $ ^^ \right_lmin right' -> let lhs_lmin = left_lmin `min` right_lmin in let closure = \lhs_gmin k' -> left' lhs_gmin ^^ $ ^^ \left_repl -> right' lhs_gmin ^^ $ ^^ \right_repl -> let lhs_repl = Bin left_repl right_repl in k' lhs_repl in k (lhs_lmin, closure) \end{code} \caption{Translation to Acyclic Haskell code (simplified).} \label{fig.acyclic} \end{figure} \section{Common Attribute Grammar Extensions} \label{sect.common} This section covers a number of common language-independent attribute grammar extensions that we need in later sections. \subsection{Local Attributes} \label{sect.localattributes} Local attributes are a simple but useful extension for sharing an intermediate result per node among rules. Rules may refer to an attribute of the form |loc.x| when it is defined by a rule. Local attributes resemble let-bindings, and examples of their use are given in later sections. %% \paragraph{Note.} Local variables are the only attributes for which %% no type signature needs to be given. The translation to Haskell does %% not require a type, but it is sometimes useful to fight the monomorphism %% restriction. %% \paragraph{Note.} Section~\ref{sect.higherorder} shows that local variables can %% be expressed using other attribute grammar constructs. \subsection{Copy Rules} \label{sect.copyrules} Many rules simply copy values up and down the tree. These rules often follow standard patterns and can be derived automatically based on the name equality of attributes. Such a derivable rule is called a \emph{copy rule}. Copy rules may be omitted by the programmer, which has significant benefits in larger grammars where the majority of rules are copy rules. Such an abstraction is familiar to Haskell programmers as it corresponds to the use of monads for implicit parameter and result passing. The following are common patterns for which rules are automatically derivable: %% \begin{figure}[t] \begin{description} \item[Topdown] Reader-monadic behavior is obtained by copying the inherited attribute |a| from the parent to children |c| that have |c.a| as inherited attribute. %% When the inherited attribute |c.a| of a child is not %% defined, but |lhs.a| exists, the copy rule |c.a = @lhs.a| is derived. \item[Bottom-up] For synthesized and chained attribute a merging operation may be specified with additional |use| syntax: \begin{code} attr Tree ^^^ syn s use mappend mempty :: Sum Int \end{code} Then, writer-monad behavior is obtained by combining the synthesized attributes |s| of the children with the |mappend| function to define the synthesized attribute |s| of the parent, or by using |mempty| when no such child exists, which would give: \begin{code} sem Tree | Leaf ^^^ lhs.s = mempty | Bin lhs.s = @left.s `mappend` @right.s \end{code} The rule |lhs.s = @val| for |Leaf| must be given explicitly because an automatically derived |mempty| is probably not intended. %% When the synthesized attribute |lhs.a| is not defined, either %% the rule |lhs.a = @c0.a `mappend` ... `mappend` @ck.a| is %% derived for the subsequence |c0 ... ck| of the children which have %% a synthesized attribute |a|, or the rule |lhs.a = mempty| is %% derived when no such child exists. \item[Chained] State-monad behavior is obtained by passing a value of an inherited attribute |a| of the parent through the children that define |a| as chained attribute, and finally from the last child back to the synthesized |a| of the parent. %% When an inherited attribute |c.a| of a child |c| is missing, %% the rule |c.a = @k.a| is derived if |k.a| exists, where |k| %% is either the nearest preceding child that has |a| as synthesized %% attribute or otherwise |lhs| if |a| is an inherited attribute. %% Also, when the synthesized attribute |lhs.a| is missing, derive %% |lhs.a = @k.a|. \end{description} %% \caption{Common traversal patterns.} %% \label{fig.traversalpatterns} %% \end{figure} When an attribute has a |use| declaration, copy rules are generated according to the topdown and bottom-up pattern, and otherwise the chained pattern. \paragraph{Note.} We do not omit copy rules for our examples for didactic reasons. However, we mention copy rules in later arguments, hence this explanation. \subsection{Higher-Order Children} \label{sect.higherorder} A production declares the children that a node has at the start of attribute evaluation. An extension, higher-order attribute grammars~\citep{DBLP:conf/pldi/VogtSK89}, allows additional children to be declared that become part of the tree during attribute evaluation. This is one of the most important and versatile attribute grammar extensions. \paragraph{Syntax and semantics.} A higher-order child |c : M| (where |c| is the name of the child and |M| the nonterminal) is a tree defined by an attribute |inst.c| of its parent node. We say that the child |c| is \emph{instantiated} with the value of attribute |inst.c|. Such an attribute is also known as a higher-order attribute. The declaration of the child is prefixed with |inst| to differentiate it from a conventional child declaration. So, to define some child |c : M| as the result of expression |e|, the child must be declared and the attribute |inst.c| must be defined by some rule: \begin{code} data N | P ^^^ inst.c : M sem N | P ^^^ inst.c = e \end{code} Equivalently, child declarations may also be given in the semantics block. Furthermore, we must define the inherited attributes of |c| and may use the synthesized attributes of |c|. Its synthesized attributes additionally depend on the definition of |inst.c|, because part of the tree must be known before synthesized attributes can be computed for it. Otherwise, a higher-order child is indistinguishable from a conventional child. \paragraph{Implementation.} The code generated for a production gets the evaluator for conventional children as parameter, but not for higher-order children. Instead, the evaluator is obtained by applying the catamorphism |sem_M| to the tree constructed for attribute |inst.c|. \begin{figure}[t] We define a nonterminal to represent a counter dispenser: \begin{code} data Uniq | Next attr Uniq ^^^ chn counter :: Int ^^^ syn value :: Int sem Uniq | Next ^^^ lhs.value = @lhs.counter lhs.counter = @lhs.counter + 1 \end{code} Copy rules can be used to chain the counter through the tree, and an attribute |@@u.value| is obtained with: \begin{code} inst.u : Uniq inst.u = Next \end{code} \caption{A unique number mechanism.} \label{fig.uniq1} \end{figure} \begin{figure}[t] We define a nonterminal to represent a local attribute: \begin{code} data Loc @alpha | Loc attr Loc ^^^ chn value :: alpha sem Loc | Loc ^^^ lhs.value = @lhs.value \end{code} To desugar a local attribute |x|, introduce: \begin{code} inst.x : Loc inst.x = Loc \end{code} and replace each occurrence of |loc.x| with |x.value|. \caption{Local attributes as higher-order children.} \label{fig.localaschild} \end{figure} \paragraph{Abstraction.} Later sections make heavy use of higher-order children as a means to view Haskell functions as a tree so that the composition mechanism as offered by AGs can be exploited. Since this pattern is important we give now two examples: \begin{itemize} \item Figure~\ref{fig.uniq1} shows a tree |Uniq| as abstraction for a dispenser of unique numbers. The tree itself is just a plain node |Next|. The required information is in the attributes. \item Figure~\ref{fig.localaschild} shows how to express local attributes with higher-order children. \end{itemize} %% there is actually a complication here that I do not want %% to mention: the order in which the children appear is %% relevant for the copy rules. This is in particular a poblem %% with higher-order children, which appear at the end in the %% order. %% \paragraph{Note.} %% Higher-order children are a seemingly innocent extension to attribute grammars. %% However, evaluators of higher-order attribute grammars may require a Turing-complete host language, %% whereas primitive recursion is sufficient for conventional %% acyclic attribute grammars. Since Haskell is Turing-complete this is not %% an issue. \subsection{Proxy Nonterminals} \label{sect.proxy} %% In Section~\ref{sect.ioc}, we make use of additional static dependencies between %% attributes to influence the structure of the evaluator so that particular %% attributes are computed together. We present a common pattern for adding a nonterminal to the grammar that serves as an alias for another nonterminal. Similar to type aliasses in Haskell, this pattern can be used to statically distinguish certain occurrences of nonterminals. %% Several later sections make use of this pattern. \paragraph{Definition.} A common pattern is to introduce a nonterminal that serves as an observable alias for another nonterminal, which we will call \emph{proxy nonterminals}. A proxy nonterminal |Proxy| for |N| is a nonterminal |Proxy| with the same attributes declarations as |N| and is defined as: \begin{code} data Proxy | P ^^^ n : N \end{code} and its semantics is trivially defined by copy rules. It thus has a single production |P|, containing one child |n| which is the nonterminal symbol |N|. We can thus substitute |Proxy| for occurrences of |N| (in productions other than |P|) without changing the attribute computations. \paragraph{Usage.} In later sections we impose for example artifical data dependencies on attributes of |Proxy| without necessarily imposing these on all occurrences of |N|. \paragraph{Grammar extension.} Proxy nonterminals can be added to the grammar by changing the original description, e.g. the data declarations. This transformation is not transparent. Code that produces the tree (e.g. a parser) must also generate the proxy nodes at the appropriate places in the tree. A transparent approach is possible, which we will use in Section~\ref{sect.typeparams2}, using an extension of higher-order attributes. Instead of \emph{defining} a child with an attribute, we allow the \emph{redefinition} of a child via an attribute that contains a function that \emph{transforms} the evaluator of the child. The following example demonstrates a transformation of a child |n : N| in production |Root| to a child |n : Proxy|: \begin{code} data Root | Root ^^^ n : N sem Root | Root ^^^ inst.n : Proxy inst.n = \evalN -> sem_P evalN \end{code} The |Root| production declares a child |n : N|. The |inst.n| attributes defines an attribute that is actually a function that takes the original evaluator of |n| as parameter |evalN| and transforms it so that it becomes an evaluator that fits nonterminal |Proxy|. In this case, we accomplish this by adding a |P| node on top of it. Note that the function |sem_P|, which is the part of the algebra that corresponds to production |P|, is exactly doing that. This transformation possible when the definition of |inst.n| does not depend on any of the synthesized attributes of |n|. %% \paragraph{Note.} %% Adding additional structure may have other benefits %% such as being able to factor out common code among %% productions. It is a design choice whether to %% specify this additional structure as part of the %% grammar or via a child transformation. \section{Parametric Polymorphism} \label{sect.typeparams1} The ability to abstract over types plays a major role in obtaining code reuse in strongly typed functional languages, and also in the form of generics in imperative languages. This section shows an attribute grammar extension for parametrizing nonterminals to abstract over the types of terminals, and for abstracting over the types of attributes. \begin{figure}[t] \begin{code} data List alpha @beta | Nil | Cons ^^^ hd :: alpha ^^^ tl : List alpha beta attr List ^^^ syn length :: Int sem List | Nil ^^^ lhs.length = 0 | Cons lhs.length = 1 + @tl.length attr List ^^^ inh f :: alpha -> beta ^^^ syn r :: List beta sem List | Nil ^^^ lhs.r = Nil | Cons lhs.r = Cons (@lhs.f @hd) @tl.r \end{code} \caption{Parametric polymorphism in an attribute grammar.} \label{fig.polymorphism} \end{figure} Figure~\ref{fig.polymorphism} shows that by parametrizing the nonterminal |List| with the type |alpha| of the terminal |hd|, it is possible to define the synthesized attribute |length| for lists containing elements of any type. Similarly, by parametrizing the attributes over a type |beta|, we can implement a functor: a transformation that maps each element |@@hd| of the list to |@@lhs.f @hd| where |@@lhs.f| is an arbitrary function from |a| (the type the list is parametrized with) to some arbitrary type |beta|. There is an essential difference between type variables |alpha| and |beta|. Type variables declared with the prefix |atsymbol| may appear only in the types of attributes, but may not appear in the types of terminals, and are not part of the generated data type. Thus, the data type is parametrized with |alpha| and the evaluator with |alpha| and |beta|. The implementation of this extension consists of printing the type variables at the appropriate places in type signatures. %% This syntax composes badly by the way, because the %% order of appearance of attributes is relevant. %% It would probably be better if the specification of %% type parameters for children would be some kind of %% record. \section{Feature: Eager rules} \label{sect.eager} The data dependencies between rules form a partial order, which suffices for attribute grammars because rules are encouraged to be pure. It may sometimes be useful to augment the data dependencies to locally prioritize certain rules over other rules in a production. This can for example be useful for debugging, efficiency, and other reasons that appear in later sections. By taking the order of appearance of rules in the source files into account, it is possible to obtain a total order among rules. This approach impairs the composability of attribute grammars, but may still be useful for specifying an order among strongly correlated rules. Other canonical total orders are far from obvious. A total order would also leave little freedom to attribute scheduling, hence we are looking for a less ad-hoc mechanism. \paragraph{Definitions.} A rule is \emph{eager} when it is described with the notation |p $= e| with a pattern |p| and expression |e|. The idea is to schedule these rules so that they are computed as soon as their dependencies are available, in contrast to conventional rules that are scheduled when an attribute that depends on it needs to be computed. This is a challenging problem. To prioritize a rule, it is also necessary to prioritize the dependencies of that rule. This interacts globally with rules of other productions, and it is not clear which one has more priority. Such global consequences are undesirable when all that we want is a bit more local priority. We therefore propose to prioritize only the attributes that involve themselves only with eager rules, and leave the scheduling of the other attributes up to their original data dependencies. An attribute |a| is \emph{eager} when each rule |r| that depends on |a| is by itself eager, or |r| depends on another eager attribute. These are global properties of a grammar that are easily derived from the grammar with a statical analysis similar to cycle analysis. Given an inherited eager attribute |a| of some nonterminal |N|, we can determine the set of synthesized eager attributes that depend on |a|. We call these the \emph{collaborators} of |a|. Furthermore, we can determine the set of non-eager inherited attributes that the collaborators depend on, which we call the \emph{opposition} of |a|. The idea is to prioritize the computation of eager inherited attributes of a child as soon as their opposition has been computed. \paragraph{Scheduling.} The scheduling algorithm of Section~\ref{sect.ordered} starts from the demanded synthesized attributes of the parent for a visit to determine which rules and child visits to schedule. We change this algorithm to realize the above idea. For each eager inherited attribute |n.a| of a child |n|, if the opposition of |n.a| can be computed, we add the collaborators of |n.a| that can be computed to the set of attributes to be computed. Recall that an attribute of a child can be computed if the inherited attributes of the parent it indirectly depends on have been computed. Then, to deal with ordering the eager rules scheduled to a particular visit, we repeatedly take the unscheduled eager rules that do not depend on any other unscheduled eager rules, and schedule their non-eager dependencies and then the rules themselves in the order of appearance. See \citet[Section~3.5.2]{middelk12phd} for a detailed algorithm. \paragraph{Properties.} The approach is sound because it only adds additional scheduling constraints. The approach is also complete: if a schedule can be computed for a grammar than a schedule can also be computed when rules are changed into eager rules. The scheduling is also locally predictable: an eager rule is guaranteed to be scheduled before an independent non-eager rule that depends on a superset of the non-eager inherited attributes that the eager rule depends. Conventional rules are scheduled based on the dependencies of the attributes that they define. Eager rules have the additional property that they also get scheduled if the inherited attributes that they depend on become available. We make use of this property in several later sections. \section{Integration: Type Classes} \label{sect.typeclasses} Haskell programmers make heavy use of type classes, and thus expect to combine them with attribute grammars. A typical use arises when some part of the tree or some of the attributes are abstracted over some type. When an overloaded function is applied to the value of such an attribute, a dictionary is required that provides the implementation of the overloaded function. The construction and passing of dictionaries is normally handled implicitly by the Haskell compiler, and the question arises how this integrates with attribute grammars. \begin{figure}[t] \begin{code} data Root alpha | Top ^^^ root : Tree alpha data Tree alpha | Bin ^^^ left : Tree alpha ^^^ right : Tree alpha | Leaf val :: alpha attr Tree ^^^ inh gmin :: alpha ^^^ lmin :: alpha ^^^ syn repl :: Tree alpha \end{code} \caption{Repmin with type classes (see also Figure~\ref{fig.repmin})} \label{fig.repmintypeclass} \end{figure} \paragraph{Example.} Figure~\ref{fig.repmintypeclass} shows a variation on Repmin of Figure~\ref{fig.repmin} which works for trees containing comparable values of any type |alpha|. We omitted the rules as these are the same as the original definition. The attributes are polymorphic in the type |alpha|, and in the rules we are using |min| from the class |Ord|, so the generated code can only be used when the type |alpha| is in the |Ord| class and when the corresponding dictionary is brought in scope of the code that is generated from the attribute definition that uses |min|. The way we generate code allows the Haskell compiler to handle type classes automatically if we do not generate type signatures. Note that type signatures are particularly important to aid error reporting, hence we are not willing to leave them out. %% so that it is more likely that %% error messages are reported in user code instead of %% generated code, Fortunately, the impact on type signatures is rather small, because only the types of the generated fold and algebra functions need to specify the used dictionaries in their body as class predicates, which can be manually specified by the programmer with a bit of additional syntax: \begin{code} attr Ord alpha => Root Tree ^^^ inh gmin :: alpha ^^^ lmin :: alpha \end{code} This notation expresses that the |Ord alpha| class constraint is added to the catamorphisms and semantic functions generated for |Root| and |Tree|. This construction is undesirable for several reasons: \begin{itemize} \item In a context where not all synthesized attributes are needed, the rule using the dictionary may not be scheduled, and the dictionary not needed, resulting in ambiguous overloading. \item It requires a language-specific extension; is a solution possible that is more native to attribute grammars? \end{itemize} \paragraph{Explicit dictionaries.} With a GHC extension it is possible to wrap a dictionary in a data constructor when constructing a value, and bring it in the environment via a pattern match. For example, the following |DictOrd| type stores an |Ord| dictionary. \begin{code} data DictOrd :: * -> * where DictOrd :: Ord alpha => DictOrd alpha \end{code} Now, dictionaries can be considered as an inherited attribute that is copied unchanged from the root. Thus, we can express the dictionary passing for the repmin example as: \begin{code} attr Root Tree ^^^ inh dict :: DictOrd alpha sem Root | Root ^^^ root.dict = @lhs.dict sem Tree | Leaf ^^^ left.dict = @lhs.dict | Right right.dict = @lhs.dict sem Tree | Leaf ^^^ lhs.lmin = @val | Bin lhs.lmin = case @lhs.dict of DictOrd -> @left.lmin `min` @right.lmin \end{code} The definitions of the dict-attributes are trivial and can be provided implicitly via copy rules. The |dict| attribute thus serves as evidence that it is possible to apply |min| to the attributes of type |a|. As expected, the code that invokes the attribute evaluator of the root must provide the value of the |dict| attribute, e.g. by building it using the |DictOrd| constructor. Advantages of this approach are that no attribute grammar extension is required and that it is oblivious to how the code is generated. %% Moreover, the attribute grammar system may %% handle the propagation of inherited attributes that are %% copied down as-is more efficiently than passing it as %% additional function parameter. A disadvantage is that the dictionary needs to be unpacked for each rule that needs the dictionary. When multiple rules need a certain dictionary in scope, it is however possible to hoist out the pattern match: \begin{code} sem Tree | Bin ^^^ DictOrd = @lhs.dict lhs.lmin = @left.lmin `min` @right.lmin \end{code} Below, we call such a rule, which pattern matches against a dictionary, a dictionary rule. \paragraph{Scheduling.} The pattern match must be in scope of the rule that needs the dictionary. The code generation only guarantees this if the dictionary rule precedes the rules that uses the dictionaries in the static rule ordering. The pattern does not define any attributes thus the dependency on it is not visible without analyzing the Haskell code (which we treat as-is). Therefore, some code needs to be added to the grammar to specify a proper static order. Upon closer inspection of the rules, we observe that the inherited attributes that contain the dictionaries are copied unchanged and are in the end only inspected by dictionary rules. Thus, if the dictionary rules are scheduled as eager rules (Section~\ref{sect.eager}), the inherited attributes become eager attributes, and the rules end up before any of the conventional rules in the order. The order among dictionary rules is then not specified, but that is fortunately also irrelevant. So, to ensure the proper ordering, it suffices to annotate the dictionary rules as eager rules: \begin{code} sem Tree | Bin ^^^ DictOrd $= @lhs.dict \end{code} The above approach builds upon more general attribute grammar features, and does not require changes to the code generation. Certainly, it requires more effort by the programmer, which can be eliminated by desugaring the notation using the pattern given above. \paragraph{Type specialization.} Since type equalities can also be represented as a dictionary using GADTs, we can now use the approach in this section to write grammars for particular instances of the type variables: \begin{code} data List alpha | Nil | Cons ^^^ hd :: alpha ^^^ tl : List alpha attr List ^^^ inh dict : alpha :=: Int ^^^ syn sum : Int sem List | Nil ^^^ lhs.sum = 0 | Cons lhs.sum = @hd + @tl.sum Refl $= @lhs.dict \end{code} We use this construction in the follow-up section (Section~\ref{sect.typeparams2}). \section{Integration: Abstraction over Nonterminals} \label{sect.typeparams2} Section~\ref{sect.typeparams1} showed that data types may have fields that have a type that the data type takes as parameter, and that we treat these fields as terminals. But what about nonterminals? For example, when some meta information such as source locations occurs at many places in different types of trees, it is common to factor it out into a separate nonterminal: \begin{code} data Info t | Label ^^^ tree :: t ^^^ line :: Int data Stmt | If ^^^ guard : Info Expr ^^^ body : Info Stmt data Expr | App ^^^ fun : Info Expr ^^^ arg : Info Expr \end{code} We would like to change terminal |tree| into a nonterminal so that |Info| becomes polymorphic in the nonterminal |t| choosen for |tree|, but it is unclear how to deal with such a grammar: what are the attributes of |Info|? This likely depends on what attributes are defined on |t| (which is not known) and the |line| likely influences them or requires additional attributes. This issue becomes even more difficult when a production has multiple of such children. %% (e.g. an attribute grammar over finger trees). \citet{Saraiva99genericattribute} deals nonterminals parameterized over nonterminals by specifying which attributes will be present. This is not a solution in this case because it |Stmt| and |Expr| may not have the same attributes. Instead, we propose a simpler approach: we virtualize the tree. The observation is that higher-up there must be a place where it is known which attributes to expect: either because the instantiation of the type variables is known or because the attributes are independent of it. For example, we can assume that we know the attributes of a tree of type |Info Expr|. For this concrete type, it is possible to derive some suitable representation that does not involve nonterminals as parameters, for example by specializing the original data definition to the known type arguments: \begin{code} data InfoExpr | Rep ^^^ expr : Expr ^^^ line :: Int \end{code} We can thus define the required attributes and rules on |InfoExpr| instead, provided that we transform a tree of type |Info Expr| to |InfoExpr|. We first introduce a proxy nonterminal for |InfoExpr|, which will take care of the conversion. \begin{code} data ExprProxy | Proxy ^^^ orig : Info Expr sem Stmt | If ^^^ inst.guard : ExprProxy inst.guard = Proxy \end{code} For the conversion, we compute the representation from |Label|, passing down as additional information that |t| is a |Stmt| in this context, and using a higher-order child to make the representation part of the tree: \begin{code} attr Info ^^^ inh eqExpr :: t ~ Expr ^^^ syn repExpr :: InfoExpr sem Info | Label ^^^ lhs.repExpr = case @lhs.eqExpr of Refl -> Rep @tree @line sem ExprProxy | Proxy ^^^ orig.eqExpr = Refl inst.rep : InfoExpr inst.rep = @orig.repExpr \end{code} Similarly, a representation for |Info Stmt| can be added, with corresponding attributes |eqStmt| and |repStmt| for |Info|. The |orig.eqExpr| attribute can only be defined in |ExprProxy| and vice versa for |orig.eqStmt|. Thus, by making these nonterminals start symbols of the grammar, these inherited attributes need only be defined for the appropriate proxies (Section~\ref{sect.tutorial}). \paragraph{Generic programming.} The above approach for specializing the types of nonterminals can be automated with some preprocessing. On the other hand, this approach makes it also possible to use a more abstract representation (e.g. using sums of products~\citep{Magalhaes:2010:GDM:1863523.1863529}) to obtain generic code. \section{Integration: Monads} \label{sect.monads} Monads are a typical abstraction that Haskell programmers use when implementing tree traversals. They are often considered as an alternative to attribute grammars. Indeed, \citet{DBLP:conf/icfp/SchrijversO11} show how to deal with stacks of reader, write and state monads to obtain a similar composability that comes naturally with attribute grammars. However, attribute grammars and monads are not mutually exclusive, and are in fact different composition mechanisms that are fruitful to combine~\citep{meijerjeuring95}. Section~\ref{sect.monadsinrules} shows the embedding of pure monadic computations that use reader, writer, state functionality as abstraction (e.g. the RWS monad), and Section~\ref{sect.agasmonad} shows how we can represent the AG as a monad to incorporate impure operations. \subsection{Integration: Pure Monadic Code in Rules} \label{sect.monadsinrules} When using an attribute grammar there seems no need to use reader, writer or state (RWS) monads, because attributes provide a more general facility when combined with copy rules (Section~\ref{sect.copyrules}). However, as rules may contain arbitrary Haskell code, that code can involve (pure) monads, and this may certainly be appropriate when constructing complex values. When the monad can be evaluated as a pure Haskell function, which is the case for RWS monads, monadic code is not different from conventional code, and can be used without a need for special attribute grammar facilities (otherwise, see Section~\ref{sect.agasmonad}). However, the use of monadic code gives rise to a particular pattern for which we can introduce an abstraction, which we discuss in the remainder of this section. \paragraph{Example.} The following grammar on lists of integers defines a synthesized attribute |r|. Given such a list |L|, the attribute |r| of |L| is also a list of integers but with consecutive elements and so that there are as many elements as the total sum of the elements of |L|. The grammar implements this behavior by concatenating lists |loc.es| that are present for each cons-node of |L|, where the size of |loc.es| is equal to the integer |fld hd| of the cons-node. The consecutive numbers are obtained by taking them from the inherited attribute |s| that is threaded to the end of the list. The computation that defines |loc.es| is given as a monadic expression |loc.m|: \begin{code} data IntList | Nil | Cons ^^^ hd :: Int ^^^ tl : IntList attr IntList ^^^ inh s :: Int ^^^ syn r :: IntList sem IntList | Nil ^^^ lhs.r = [] | Cons lhs.r = @loc.es ++ @tl.r (loc.es, tl.s) = runState @loc.m @lhs.s loc.m = replicateM @hd $ do e <- get modify (+1) return e \end{code} The state monad takes the initial counter, produces the result |loc.es| and an updated counter, which is subsequently passed on to the tail of the list as |tl.c|. \paragraph{Concerns.} This simple example demonstrates the use of monads in rules. It also shows that attributes have to be threaded into and out of the monad (e.g. via |runState|). Such rules that interface with the monad are tedious to write because they mention all attributes that play a role in the monad. This becomes more of an issue when multiple of these rules occur in a production, because of the threading of the attributes between rules and children. Thus, such a construction impairs the ability to describe rules for attributes separately and thus affects the composability of the description. Code as the above is also prone to mistakes in attribute names that lead to accidental cycles in the threading of attributes, e.g.: \begin{code} (..., loc.s1) = ... @lhs.s (..., loc.s2) = ... @loc.s2 -- cycle: should have been |s1| (..., tl.s) = ... @loc.s2 \end{code} Fortunately, this classical mistake is caught by the static dependency analysis of attribute grammars. It would otherwise lead to hard to find cases of nontermination. \paragraph{Composable descriptions.} As a solution to the composability issues, we show another use of higher-order children (Section~\ref{sect.higherorder}). First we introduce a nonterminal |M phi alpha| with a single production |Do| that represents a monadic computation that it obtains as inherited attribute |expr| of type |State phi alpha|, where |phi| is the type of the state and |alpha| is the result type: \begin{code} data M @phi @alpha | Do attr M ^^^ inh expr :: State phi alpha chn s :: phi syn a :: alpha sem M | Do ^^^ (lhs.a, lhs.s) = runState @lhs.expr @lhs.s \end{code} Given a tree |M phi alpha|, we can obtain the result of the monadic computation as attribute |a|, and also have the input and output state as chained attribute |s|. We can construct such a tree using the constructor |Do|, but how to integrate it with the actual tree? This is where higher-order children come in again. The following example shows how to declare a higher-order child |m1|, its definition and the threading of the attributes: \begin{code} sem IntList | Cons ^^^ inst.m1 : M inst.m1 = Do m1.expr = @loc.m loc.es = @m1.a m1.s = lhs.s tl.s = m1.s \end{code} Inlining these definitions gives actually the same code as the former example. The difference is the ability to specify the threading of the attributes separately and factoring out the wrapping code of the monads. In addition, copy rules (Section~\ref{sect.copyrules}) may take care of the threading rules altogether. %% \paragraph{Note.} %% This pattern is applicable in general when dealing with %% rules that take many attributes as inputs and outputs. %% We see this pattern again in the next section. \subsection{Integration: Attribute Grammars as Monads} \label{sect.agasmonad} The previous section showed rules containing monadic RWS operations. Dealing with impure monadic operations is more involving, as we discuss in this section. Of particular interest are |IO| and |ST| operations. The ability to e.g. update auxiliary data while processing a tree opens up a whole range of applications. At first glance, monadic operations may not appear as quite a challenge because attribute grammars can be mapped to a sequential computation (Section~\ref{sect.ordered}) and the resulting computation can be represented as a monadic computation so that rules can be an arbitrary monadic expression. However, a declarative formalism is a double-sided sword in this setting. The evaluation of rules depends only on data dependencies, which gives little guarantees with respect to when rules are evaluated, if at all. To be able to use monadic operations, we need to provide stronger guarantees, e.g. that monadic effects are always performed and at most once. \paragraph{Example 1.} To introduce monadic rules, we give a variant of the unique number dispenser of Figure~\ref{fig.uniq1}. When there is only the requirement that the produced numbers are unique but not that they are sequential, we can pass a reference to a shared counter as an inherited attribute and use monadic code to fetch-and-increment it: \begin{code} attr Uniq ^^^ inh hCounter :: TVar Int ^^^ syn value :: Int sem Uniq | Next ^^^ lhs.value <- atomically $ do ^^^ c <- readTVar @lhs.hCounter writeTVar ^^ $! ^^ c+1 return c \end{code} This example features a monadic rule, which is a rule of the form |p <- m| where |p| is a pattern and |m| a monadic expression. It has the expected semantics: it is translated to |m' >>= \p' -> r|, where |m'| and |p'| are the respective translations of |m| and |p|, and |r| is the remainder of the computation that is scheduled after the rule. %% This example also needs the guarantee that the occurrences of |loc.x| in the %% production evaluate to the same value. We shall furthermore assume that monadic rules are scheduled as eager rules. %% (Section~\ref{sect.eager}), which is relevant for the remainder of this section %% where we show an example that combines several features. \begin{figure}[t] \begin{code} data Proc | Pending | Handle ^^^ req : Request ^^^ next : Proc data Request ... -- some tree-like structure attr Proc ^^^ inh chanIn :: Chan Request ^^^ inh chanOut :: Chan Response attr Request ^^^ syn result :: Response ... -- + other attributes on requests sem Proc | Pending ^^^ inst.run : Proc inst.run <- readChan @lhs.chanIn >>= \m -> return (Handle m Pending) | Handle ^^^ _ <- writeChan @lhs.chanOut @req.result \end{code} \caption{Sketch of a stream processor that reads modules from |chanIn| and puts the processed results in |chanOut|.} \label{fig.streamprocessor} \end{figure} \paragraph{Example 2.} Consider a system that is processing a stream of tree-shaped requests that it takes from an input channel and outputs the responses to an output channel. This system can for example be some kind of webservice or a streaming compiler. The question we now address is whether we can represent the stream processor as an attribute grammar so that we can use attributes to describe the flow of information from one request to the next (e.g. environments). Figure~\ref{fig.streamprocessor} gives the general structure of the stream processor. This description requires several ingredients. It incorporates monadic rules that read and write from the channel (as shown earlier in this section), and higher-order children (Section~\ref{sect.higherorder}) so that the trees read from the channel become children of the processor. Below, we discuss the example a bit further, and then zoom in to the semantics of monadic rules. The stream processor is an automaton. That we can describe it with a grammar is not a surprise, because certain automata are used to specify the semantics of attribute grammars. The productions of |Proc| describe the states of the processor. A node |Pending| represents the processor in the state where it reads a request from the channel. When it did so, it creates a |Handle| node as higher-order child which processes the node and writes the response to the output channel. This way the attribute grammar evaluator simulates the state transitions of the processor when it visits |Proc| nodes. Executing the code generated from the grammar leads to a surprise: nothing is evaluated at all! The reason is that attribute grammar evaluation is driven by data dependencies. Since |Proc| does not define any synthesized attributes, there are thus no dependencies on rules or children of its productions. There are also no obvious synthesized attributes to be given to |Proc|, because it needs to output to the channel instead. Similarly, the rule with |writeChan| does not define any attributes so there are no data dependencies on the rule. Making monadic rules eager (Section~\ref{sect.eager}) takes care of the latter case, but not of the former, so we need to extend the grammar with additional code. \paragraph{State threading.} We take inspiration from the integration of IO in Clean~\cite{DBLP:conf/haskell/GroningenNAKP10} and state threading in the |St| and |IO| monad to add additional data dependencies to the grammar. Applying a variation on the pattern of Section~\ref{sect.monadsinrules}, we introduce a nonterminal |M| which serves as a wrapper for monadic actions. The monadic action is provided as inherited attribute |expr|, and the result of the monadic action given as the synthesized attribute |value|. In addition, it contains a chained attribute |st| that represents the state threading, and can be used to introduce data dependencies. \begin{code} data M @alpha | Do attr M ^^^ inh expr :: IO alpha syn value :: alpha chn st :: StateToken sem M | M ^^^ (lhs.result, lhs.st) <- do a <- code return (a, @lhs.st) \end{code} The |st| attribute is a token of some opaque type (with a role similar as |State# RealWorld|), and later we come back to the essential role that it plays. We first show that the monadic operations can now be written as: \begin{code} sem Proc | Pending ^^^ inst.oper1 : M inst.oper1 = Do oper1.expr = readChan ... inst.run = @oper.value | Handle ^^^ inst.oper2 : M inst.oper2 = Do oper2.expr = writeChan ... _ = @oper2.value \end{code} In addition, definitions for the |st| attributes are needed. Since their values are opaque, a value cannot be given by the programmer, so the value will have to come from the parent, and its parent, and so on: \begin{code} attr Proc ^^^ chn st :: StateToken sem Proc | Pending ^^^ oper1.st = @lhs.st act.st = @oper1.st lhs.st = @act.st | Handle ^^^ oper2.st = @lhs.st next.st = @oper2.st lhs.st = @next.st \end{code} Above we gave the rules for the |st| attributes explicitly, but can actually be omitted because they are copy rules (Section~\ref{sect.copyrules}). The way we connect the |st| attributes determines influences the evaluation order, to which we come back to later. \paragraph{Initial token.} At the root, the token is passed via an inherited attribute, and taken out as synthesized attribute. For example, the programmer can call the generated monad code via: \begin{code} withTk :: Monad m => StateT StateToken m alpha -> m alpha withTk m = evalStateT m hiddenTokenValue \end{code} The data dependencies on the token then ensures that the monadic operations will be evaluated. With standard static attribute dependency analysis the proper threading can be checked, e.g. to verify that the |st| attribute of each |M| child is a dependency of a synthesized attribute of the root, and if desired, that the |st| attributes are referenced at most once. \paragraph{Bottom-up.} The data dependencies on the |st| attribute influences the relative order of the monadic operations. The definition of inherited attributes is usually handled by copy rules. For the synthesized attributes, we can either thread the token through the children sequentially (the most restrictive), or collect the tokens bottom up (the least restrictive), e.g. implicitly via a collection rule: \begin{code} attr Proc ^^^ chn st use seq :: StateToken sem Proc | Pending ^^^ oper1.st = @lhs.st run.st = @lhs.st lhs.st = @oper1.st `seq` @run.st | Handle ^^^ oper2.st = @lhs.st next.st = @lhs.st lhs.st = @oper2.st `seq` @next.st \end{code} These rules can again be omitted, as they are implied by the collection rule. Due to the strict attribute evaluation, both operands to |seq| will already be evaluated but it does not specify in which order. It depends on the monadic operations in question how strict the order guarantees have to be. The latter approach has the advantage that automatic paralellization of attribute grammars can run the processor for the next request in parallel with the analysis of the current module, as soon as the results are available that are required for the processor of the next request. Whether this is desirable depends on the application, as it may change the order in which the responses are written to the output channel. \paragraph{Strictness matters.} The example also shows that it is important to generate strict code: lazy results (that may keep data of previous requests alive) can be disastrous for memory usage. It is also important that the generated code for the processor is tail recursive, otherwise it keeps consuming more memory with each subsequent module it processes. This is the case when the synthesized attributes are only copies of the synthesized attributes of the last visited child. This also holds in the case when the |st| attribute is defined according to the latter approach, because |seq| is rewritten to its right argument when its left argument is proved to be evaluated already. \paragraph{Syntactic sugar.} The syntax with the higher-order children is rather verbose. To remedy this, we can easily provide syntactic sugar for it using the notation |p <- c = e| where |c| is the name to introduce for the monadic operation: \begin{code} sem Proc | Pending ^^^ inst.act <- oper1 = readChan ... | Handle ^^^ () <- oper2 = writeChan ... \end{code} \paragraph{Note.} We exploit the correspondence between attribute grammars and monads as seen in Section~\ref{sect.monadsinrules}. The use of monads in a Haskell function has consequences on the type of the function and requires sequentialization of the code (e.g. do notation). In an attribute grammar, the consequences become visible as additional attribute and the semi-implicit threading of these attributes. Also, the IO monad is a state monad where operations get and put the state. In an attribute grammar, those are higher-order children that thread the state attributes. \section{Feature: Inversion of Control} \label{sect.ioc} A common pattern that appears when writing tree computations is to first perform some initial computation over the tree (e.g. spreading environments), followed by an iterative computation (e.g. computing some fixpoint), followed by a resulting computation (e.g. producing a transformed tree and collecting error messages). This section provides a construction for expressing this pattern, and as it turns out use it to encode the monadic rules of the previous section. \paragraph{Iteration.} There are several ways to incorporate iterative or fixpoint computations in attribute grammars.~\cite{Farrow:1986:AGF:12276.13320}. Using Haskell, lazy evaluation can be exploited to obtain iteration by lifting attributes to lists and giving a collection of cyclic attribute definitions that define the value of index |i| in the list in terms of values in the list of attributes at indices |j < i| (preferably |j = i-1|). However, it is tedious to write these equations especially when different attributes are involved in the cycle. Moreover, the rule ordering cannot be expressed this way. We present a different solution that extends cycle-free attribute grammars with an inversion of control construction that can be used to express iteration. The general idea is that we can obtain from a child a function |f| that represents the computation of a subset of its attributes, and can replace it with another function. To this end, we need additional syntax to specify which attributes are involved and to specify a transformation function of the function that computes these attributes. We illustrate this syntax with an example. \begin{figure}[t] \begin{code} data Top | Root ^^^ proc : Proc data Proc | Handle attr Control Proc ^^^ inh chanIn :: Chan Request inh chanOut :: Chan Response chn st use seq :: StateToken attr Request ^^^ syn result :: Response ... -- other attributes on requests sem Proc | Handle ^^^ inst.req : Request inst.req <- oper1 = readChan @lhs.chanIn () <- oper2 = writeChan @lhs.chanOut @req.result expl Proc ^^^ chn st sem Top | Root expl.proc = fix (\r f i k -> f i (\s -> r f (s2i i s) k)) s2i i s = i { st = st s } \end{code} \caption{Stream processor realized through inversion of control.} \label{fig.streamioc} \end{figure} \paragraph{Example.} Figure~\ref{fig.streamioc} shows an example which is based on the stream processor of Section~\ref{sect.agasmonad}. The inversion of control syntax is given near the bottom of the figure, and we explain the example's code and the extra syntax below. In contrast to the example in the earlier section has the processor |Proc| only one state in which it obtains the request, processes it, and outputs the response. It does not spawn a new processor to handle the remainder of requests in the channel. Instead, we added a toplevel nonterminal |Top| that contains the processor as child |proc|, and additional description so that the computations for the processors are repeated indefinitely. To explain this additional description, we first introduce some vocabulary. \paragraph{Syntax and semantics.} The optional \emph{explicit attribute set} (EAS) of a nonterminal is a declared subset of the attributes of the nonterminal. The syntax to declare it is similar to attribute declarations, except that it uses the keyword |expl| and the attribute types are omitted. In the example, the |st| attributes of |Proc| are in the set. Declaring an EAS has consequences. For each nonterminal |N| with an EAS, a production containing a child |n : N| must define an attribute |expl.n|. This function gets as parameter the evaluator for the attributes in the EAS, and must give such an evaluator as result. Consequently, we can influence the application of the evaluator for a particular subset of the attributes. The identity transformation is obtained by defining |expl.n = id|, and more complex transformations are obtained by exploiting that the evaluator is a monadic function that takes a record containing values of the inherited attributes in the EAS and a monadic continuation that receives a record containing values of the synthesized attributes in the EAS. The definition in the example denotes the repeated application of the evaluator |f| to the inherited attributes |i| (containing only the inherited |st| attribute), where the inherited |st| attribute for the next application is taken from the synthesized |st| attribute of the previous application. The function |s2i| performs the conversion from the synthesized attributes to the new inherited attributes by replacing the |st| field in the record with the old attributes. With the current construction, only one EAS can be specified per nonterminal. This is not a limitation as the constructions are composable by introducing proxy nonterminals (Section~\ref{sect.proxy}). This is also a good practice when inversion of control is not required for each occurrence of a nonterminal symbol. The EAS must be declared so that it is clear in the specification which attributes are contained in the input and output records. The inherited channels are for example not in the set. This way they are passed to the processor in an earlier visit so that the work that depends on these attributes is not repeated. %% Dependencies \paragraph{Static dependencies.} To ensure that we can obtain an evaluator that takes the inherited attributes in one go we impose the static restriction that the inherited attributes in the EAS may not depend on any of the synthesized attributes in the EAS, and that each synthesized attribute in the EAS depends on each inherited attribute. This additionally ensures that the evaluation occurs only in the child, and does not require evaluation at a parent node. Furthermore, to have the |expl.n| attribute available for such a node |n| when computing the attributes in the EAS set, it needs to be an additional dependency of |n.a| for all attributes |a| in the EAS. \paragraph{Implementation.} The implementation of this feature is surprisingly straightforward. When we schedule a visit |v| to a node to compute a synthesized attribute mentioned in the EAS, then it needs to schedule all the attributes in the EAS. We precede |v| with an additional visit |u| that can take care of other attributes that may be involved that are not in the EAS. With this approach, when scheduling a visit |v| for some node |n|, we simply call the function defined by |expl.n| (which will be in scope) with the evaluator for |v| (which will also be in scope) instead of calling the evaluator for |v| directly. We desire a least number of computations in |v| to prevent duplicate work when re-applying the evaluator. Eager rules aside, the strategy of evaluating only the rules that are needed for producing the values for the synthesized attributes scheduled to |v| ensures that we do not compute additional results that are discarded when reinvoking the evaluator. The purpose of |u| is to compute all attributes that are dependencies of synthesized attributes in the EAS but that do not depend on inherited attributes in the EAS. This requires a similar enhancement to the scheduler as discussed in Section~\ref{sect.eager}: when we schedule a visit, we can specify additionally a set of synthesized attributes for which the scheduler schedules all dependencies that can be scheduled, e.g. which depend only on inherited attributes that are available so far. \begin{figure}[t] \begin{code} data P @alpha -- placeholder for a monadic computation | Nop attr P ^^^ chn st :: StateToken syn mbVal :: Maybe alpha expl P ^^^ chn st ^^^ syn value sem P | Nop ^^^ lhs.st $= @lhs.st lhs.mbVal = Nothing sem M | M ^^^ inst.act : P inst.act = Nop expl.act = \f i k -> f i $ \s -> @lhs.expr >>= \a -> k s { mbVal = Just v } ^^^ lhs.value = fromJust @act.mbVal \end{code} \caption{Monadic operations via inversion of control.} \label{fig.monadicioc} \end{figure} \paragraph{Expressiveness.} The construction in this section is expressive: \begin{itemize} \item Figure~\ref{fig.monadicioc} shows how to encode the monadic actions of Section~\ref{sect.agasmonad} with it. The nonterminal |P| serves as a placeholder that computes |Nothing|, but exhibits the desired scheduling constraints. Its evaluation is transformed to execute the monadic action and update the result with it. We can thus eliminate the language-specific monadic rules with the more general and language-independent construction shown in this section. \item Figure~\ref{fig.exceptions} shows exception handling and backtracking. Suppose that |N| is a nonterminal that provides two ways for computing the synthesized attributes depending on an inherited attribute |e|. If some exception occurs during the first way, we want it to take the alternative way, which we accomplish by running the evaluator with a different value for |e|. In generel, this construction makes it possible to integrate Iteratees~\citep{DBLP:conf/flops/Kiselyov12} and stepwise evaluation~\citep{DBLP:conf/ldta/MiddelkoopDS11}. \end{itemize} \begin{figure}[t] \begin{code} attr N ^^^ inh e :: Maybe BacktrackException expl N ^^^ inh e data M | P ^^^ c : N sem M | P ^^^ c.e = Nothing expl.c = \f i k -> catch (f i k) (\ex -> f i { e = Just ex } k) \end{code} \caption{Example of exception handling and backtracking.} \label{fig.exceptions} \end{figure} \paragraph{Note.} This construction also makes a unit of attribute evaluation explicit, and it is possible to encode this evaluation in different ways, each leading to different evaluation strategies. For example, one could choose to generate lazy code to allow mixing monadic/sequential with cyclic lazily evaluated code. Or, instead of calling the evaluator repeatedly, the evaluator could produce an updated version of itself to be called for a subsequent invocation which can refer to values computed in the previous iteration using incremental evaluation techniques~\citep{DBLP:journals/sigplan/YehK88b}. This section showed a practical application of inverstion of control in attribute grammars, thereby paving the way for a thorough theoretical investigation. %% A thorough investigation of this design space is out the scope %% of this paper. \section{Related Work} \paragraph{Background.} Attribute grammars where introduced by \citet{DBLP:journals/mst/Knuth68} to define the semantics of context free languages, and have since found their application in compiler generation. The circularity of attribute grammars is a prominent topic in related literature. \citet{DBLP:journals/acta/Bird84} provided the basis for attribute grammars as circular functional programs~\citep{Johnsson:1987:AGF:36583.36593}. \citet{DBLP:conf/ifip2-4/SwierstraA98} give the corresponding translation to Haskell, and show the advantages of embedded Haskell code in rules. In a different setting, \citet{DBLP:conf/popl/KennedyW76} gave an abstract interpretation of acyclic attribute grammars for the generation of efficient evaluators, but may require the evaluator to support a number of visit sequences that are exponential in the amount of attributes. \citet{DBLP:journals/acta/Kastens80} showed an approach that is incomplete but requires only a single visit sequence. \citet{Saraiva99} showed a continuation-based translation to strict functional programs for this case. \citet{Bransen:2012:KAR:2187125.2187139} report that Kastens' approach is too restrictive in the context of UHC, and propose a functional implementation of the Kennedy-Warren approach instead which does not exhibit exponential behavior in practice. Circularity has its undeniable uses in e.g. data-flow analyses~\citep{DBLP:journals/ipl/ThomeW89} or when dealing with DAGs~\citep{DBLP:journals/scp/MagnussonH07}. In contrast to these approaches, we keep the circularity out of the grammar and instead provide hooks into the evaluator to perform iteration or tie the knot. \paragraph{UUAGC.} The Universiteit Utrecht Attribute Grammar Compiler (UUAGC) is the source of inspiration for this paper. The requests for the features discussed in this paper originated from the UHC project~\citep{uhc} and from students taking a course on program analysis. UUAGC supports higher-order children, demand-driven and statically ordered attribute evaluators, and polymorphism and overloading. It offers various forms of code generation, including monadic code that it can additionally exploit for generating a parallel evaluator. The ideas related to eager rules and inversion of control originate from a research project~\citep{middelk12phd} and corresponding prototype implementation~\citep{Middelkoop10gpce}. We made these ideas suitable for attribute grammars (this paper) and are integrating them into UUAGC. \paragraph{Functional programming.} Aside from attribute grammar preprocessors such as UUAGC and Happy, there are also deep embeddings~\citep{DBLP:journals/informaticaSI/MoorBS00,DBLP:conf/icfp/VieraSS09}. The deep embeddings integrate well with the type system, and the preprocessors usually leave type checking to Haskell. Recently, \citet{DBLP:conf/sle/KaminskiW11} showed the inverse direction: how to incorporate functional programming features into attribute grammars, including type inference, polymorphic types, and pattern matching. %% \citet{meijerjeuring95} show the simultaneous use of monads and folds, and thus serves %% as motivation for combining monads with attribute grammars. \section{Conclusion} %% Attribute grammars have been around for many years but %% have not flourished much beyond the area of compilation. %% However, as attribute grammars are a domain specific %% language for describing catamorphisms, functional programming %% shows that attribute grammars can be applied to a wider %% range of problems than compilation only. This is becoming more %% relevant with the rising desire for declarative formalisms and %% the increase in distributed computational power. Purely functional programming languages and attribute grammars fit well together, because purity gives the necessary freedom for scheduling attribute computations. Previous work has shown that Haskell is in particular a good host language because its lazy evaluation provides most of the machinery needed to implement attribute grammars. Some desirable Haskell features raise challenges when combined with attribute grammars, and this paper presented solutions to these challenges. These challenges included the support of data types with higher kinds and monadic effects. Our solutions relied on two general attribute grammar techniques that we used throughout the paper: higher-order children and static attribute scheduling. On top of these extensions, we proposed eager rules to influence the static scheduling, and inversion of control to hook into the attribute evaluator. Some of the addressed challenges are strictly spoken not unique to Haskell, but do show up more prominently when using Haskell. %% For example, in imperative languages, %% demand-driven attribute evaluation causes less apparent memory retainment %% because of the absence of thunks in data structures and the more common use %% of mutable structures. The attribute grammar extensions that we propose are however not language specific and thus offer general solutions that are useful for other languages as well. This paper can therefore also be seen as motivation for investing the effort of incorporating extensions such as higher-order children into an attribute grammar system. This paper also showed the need for static attribute scheduling, and the question remains how we can further exploit it. In contrast to higher-order children, the attribute scheduling is not so easily implemented and clashes with some extensions that are of a dynamic nature. This potentially asks for approaches to combine demand driven and statically ordered attribute evaluation, which is where techniques as presented in Section~\ref{sect.ioc} can play a role. %% This paper investigated the combination of attribute %% grammars and Haskell and the challenges that this %% combination provides when applied to a wider problem %% domain which impair the adoption of attribute grammars. %% For example, we showed that lazy evaluation can pose severe %% performance issues when dealing with large grammars, and showed %% how we tackle these challenges with static analyzes. %% Moreover, we discussed that programmers expect to be able %% to parametrize the grammar, which poses challenges in combination %% with static typing and static analyzes. %% For further improvements upon the expressiveness and adoption %% of attribute grammars, a future direction of research is more %% explicit traversal strategies for attribute grammars. An %% attribute grammar specifies computation building blocks that %% can be composed in various ways, which we for example showed with %% inversion of control structures. With more refined notions of %% strategies, attribute grammars can be made more suitable for %% problems that require an investigation of a search space. %% Some initial work in this direction was carried out as %% part of a phd thesis %% An implementation of the ideas discussed in this paper %% is available in the form of UUAGC. UUAGC serves as an %% important corner stone for UHC, the Haskell compiler %% developed in Utrecht. \acks Our thanks go out to the contributors and users of UUAGC. \bibliographystyle{abbrvnat} \bibliography{references} \end{document} -- LocalWords: Monads monads catamorphisms