This chapter introduces the concept of visits, which play an important role in subsequent chapters of this thesis. We present this concept by means of a correspondence with the visitor design pattern. The visitor design pattern is often applied to describe traversal algorithms over Abstract Syntax Trees (ASTs) in imperative programming languages. It defines a \emph{visitor}, an object with a visit method that is executed for each node in the AST, and updates the state of the visitor, and possibly the states of nodes as well. The order in which the visitor visits the nodes is explicitly under control of the programmer, which is essential to deal with the side-effectful computations that modify the state of the visitor. However, the exchange of results between traversals is error-prone. Attribute grammars with a statically ordered attribute evaluation (Section~\tref{sect.ag.eval.ordered}) are an alternative way to describe multi-traversal algorithms. An Attribute Grammar (AG) defines attributes of nodes in the AST as functions of other attributes, and an attribute evaluator decorates the AST with the attributes in one or more traversals. The attributes form a convenient mechanism to exchange results between traversals. A strong point of AGs is that the order of evaluation is implicit. As a consequence, however, AGs discourage the use of side effects. We present RulerCore, a language that combines attribute grammars with visitors. In RulerCore, sufficient assumptions can be made about the evaluation order to facilitate side effects. In Chapter~\tref{chapt.warren} we show how to formally reason with such side effect. A RulerCore grammar can be used in combination with several host languages. In the outline of this chapter (Section~\tref{intro.outline.sideeffect}) we sketched RulerCore with the purely functional, statically typed language Haskell as host language. In this chapter, we actually show RulerCore in combination with the imperative and dynamically typed language {JavaScript}\footnote{ In the outline of this chapter, we limited side effects in RulerCore to rules that determine the shape of children. Since we cannot enforce the absence of side effects in {JavaScript} expressions, we do not impose this restriction. Instead we present a pin-rule, which can be restricted to a visit and allows for safe use of side effect. }. This chapter thus introduces the concepts that underly the subsequent chapters without a dependency on knowledge of Haskell. Also, it serves as a basis of how contents of the subsequent chapters can be mapped to other languages than Haskell. \section{Introduction} Algorithms for traversing tree-shaped data structures appear in many applications, especially in compilers. A lot of effort has been invested in developing proper abstractions for tree traversals, for example in the form of a tree-walking automaton (Section~\tref{sect.treewalkautomaton}), or in a more abstract way with Attribute Grammars (AGs)~\citep{DBLP:journals/mst/Knuth68}. AGs are an attractive language for the development of compilers. We applied AGs in many small projects (to teach compiler construction~\citep{CCO}, master projects, etc.), and several large projects, including the Utrecht Haskell Compiler~\citep{uhc}, the Helium~\citep{Leijen:helium} compiler, and the editor Proxima~\citep{Schrage04proxima}. AGs are an important asset in these projects. The example in Section~\ref{side.example} demonstrates some of the reasons. Tree traversals play a role in many other fields, including end-user applications. Web applications, for example, traverse and compute properties of DOM trees. Unfortunately, the abstractions that emerge from research in compiler construction are not used to write such traversals. To use AGs, sufficient familiarity with the formalism is required, which may be an obstacle for many programmers. Also, tool support is typically absent for the programming language in question, and the AG formalism poses severe restrictions to be used effectively in these areas, such as prohibition of side effect. In this chapter, we treat the latter two challenges, which are of a technical nature. Considering the first challenge, for imperative languages like {JavaScript}, a programmer either writes recursive functions, or takes a more structured approach via the visitor design pattern~\citep{DBLP:conf/ecoop/GammaHJV93,674267,DBLP:conf/oopsla/OliveiraWG08}. Tool support for the visitor design pattern is available for many languages. For example, the parser generator SableCC~\citep{DBLP:conf/tools/GagnonH98} generates visitor skeleton code for the ASTs that the parser produces. With visitors, side effects are used to carry over results computed in one visit to the next visit. In our experience, the scheduling of visits and their side effects is an error-prone process, due to the absence of the define-before-use guarantee. We elaborate on this in Section~\ref{side.example.vis}. Attribute grammars offer a programming model where each node in the AST is associated with named values that are called \emph{attributes}. An AG description contains computations that define attributes in terms of other attributes. If these definitions are noncircular, the description can be translated to a multi-visit traversal algorithm where each attribute is defined before it is used. The scheduling of the computations in implicit, which saves a programmer from writing the scheduling manually, and thus also cannot do it wrong. However, the implicit scheduling comes with a severe restriction: side effects cannot be used reliably and should not be used in attribute computations. In web applications, for example, we typically need side effects to influence the contents of a webpage. We elaborate on this in Section~\ref{side.example.ag}. The main contribution of this Chapter is an extension of attribute grammars that has an explicit notion of visits, which offers a hybrid model between visitors and attribute grammars, while maintaining the best of both worlds. In fact, besides being more expressive, our extension make attribute grammars more intuitive to use. We also address the second challenge, which is to make our approach available for many host languages. We present RulerCore, a small but powerful language for tree traversals. We managed to isolate the language-dependent part into a small subset called RulerBack, and show the translation from RulerBack to {JavaScript}. In later chapters, we show a translation to Haskell. With these two languages, we cover the implementation issues regarding the full spectrum of mainstream general purpose programming languages available today. Similar to other preprocessed languages, code fragments of the host language are embedded in RulerCore to describe the computations for attributes. The embedding keeps general-purpose programming constructs out of RulerBack, and allows the programmer to express computations without having to learn a special language. In particular, RulerBack is suitable as a host language for attribute grammars. In this chapter, we present the languages RulerCore and RulerBack. We do so by using an example based on the alignment of HTML menus. This example requires a traversal of the AST to determine the sizes of the HTML items, and another pass to compute the locations of the items. Section~\ref{side.example} presents the example in each of the above languages. This chapter focussed on RulerBack. We introduce RulerBack in Section~\ref{side.sem.rulercore} and show a translation to JavaScript in In Section~\ref{side.trans.rulercore}. In Section~\ref{side.trans.rulerfront} we get back to RulerCore and describe the translation to RulerBack. \begin{figure}[p] \makebox[\textwidth][r] { \begin{tikzpicture} [ nd/.style={rectangle, minimum size=5mm, thick, draw=black!50, top color=white, bottom color=black!20,font=\tiny} ] \node[nd, minimum width=25mm](a) {item a}; \node[nd, minimum width=22mm, below left=0mm of a.south east](b) {very big item b}; \node[nd, minimum width=19mm, below left=0mm of b.south east](c) {not so big c}; \node[nd, minimum width=16mm, below left=0mm of c.south east](d) {tiny}; \end{tikzpicture}} \vspace{-25mm} \begin{smallcode} \begin{code} function Menu(name, children) { -- constructor of a |Menu| AST node this.name = name; -- the name of the element to align this.children = children; -- an array of children menus } Menu.prototype.accept = function(visitor) { visitor.visitMenu(this); } -- invokes the appropriate visit method function Visitor() { -- constructor of a |Visitor| object this.depth = 0; -- the depth so far in the menu tree this.maximum = 0; -- the maximum width observed so far this.count = 0; } -- the number of menus laid out so far var root = -- the menu tree and corresponding html nodes new Menu("a", [ -- @
item a
@ new Menu("b", [ -- @
very big item b
@ new Menu("c", []) -- @
not so big c
@ , new Menu("d", []) -- @
tiny
@ ]) ]); -- @
@ function align(root, anchor) { -- aligns the html nodes according to the menu tree var v = new Visitor(); -- creates visitor with empty state v.visitMenu = function(menu) { -- first visit method (gets |menu| node as param) menu.elem = document.getElementById(menu.name); menu.depth = this.depth; -- remember |depth| for the second visit this.maximum = Math.max(this.maximum, this.depth * 20 + menu.elem.clientWidth); for(var i in menu.children) { this.depth = menu.depth + 1; -- reset |this.depth| to one deeper than current menu.children[i].accept(this); -- invokes visitor on children } } root.accept(v); -- invokes the first visit (on the root) v.visitMenu = function(menu) { -- second visit method (gets |menu| node as param) var offset = menu.depth * 20; menu.elem.style.left = (anchor.offsetLeft + offset) + "px"; menu.elem.style.top = (anchor.offsetTop + this.count * 30) + "px"; menu.elem.style.width = (this.maximum - offset) + "px"; menu.elem.style.height = 30 + "px"; this.count ++; -- inorder numbering of nodes for(var i in menu.children) { -- invokes |visitor| on |menu|s children menu.children[i].accept(this); -- |count| should not be reset in this case } } root.accept(v); } -- invokes the second visit (on the root) \end{code} \end{smallcode} \caption{Pseudocode of dual-visit menu alignment.} \label{side.fig.intro.visitor} \end{figure} \section{Example} \label{side.example} In this section, we motivate the claims of the introduction in more detail, and introduce the background information relevant for the remainder of the chapter. We take as a use case the alignment of an HTML menu in a web application using {JavaScript}, based on a multi-visit tree traversal over an abstract description of the menu. We first show a solution using the visitor-pattern, then a near-solution using attribute grammars, and finally two solutions using RulerCore. \subsection{Visitor Design Pattern} \label{side.example.vis} In the visitor design pattern, each node of the Abstract Syntax Tree (AST) is modelled as an object, which stores references to the subtrees, and has an |accept| method. The |accept| method takes a visitor as parameter. A visitor is an object with a |jvisit|-method for each type of node. The |accept| method of the AST node calls the appropriate |jvisit|-method on the visitor and passes the node as an argument. This |jvisit| method consists of statements that manipulate the state of the visitor and the AST node, and can visit a subtree by calling the |accept| method on the root of a subtree, with the visitor-object as parameter. Figure~\ref{side.fig.intro.visitor} shows an example of a visitor that lays out HTML items as a menu in a tree-like fashion, as visualized in the upper-right corner of the figure. The menus are aligned to the right, and submenus are slightly indented. Furthermore, we desire the items to have a minimal size, but large enough to contain their contents. The variable |root| contains an abstract description of the menu as a tree of |Menu| objects (the AST). Associated with each |Menu| object is an HTML item with the same name. We interpret the menu structure to layout the HTML items. In the first visit to the menu tree, we query the widths of the corresponding HTML items. In the second visit, we adjust the positions and sizes of these items. Some information (such as indentation based on the |depth|) is computed in the first visit, and also needed in the second visit. This information is stored as additional fields in the menu objects. The order in which the tree is visited is clearly defined by the explicit |accept|-calls in the |jvisit|-methods. The order of the calls ensures that the sized of the HTML items are queried before they are resized. However, there are a number of issues with the above solution. In the second visit, we require that a number of values are computed in the first visit. These values are stored in the state of the AST nodes during the first visit. This approach has a number of problems. It does not guarantee that the values that we store indeed those values that we need later. Furthermore, we never remove any of these values from the state, and thus retain all memory until the AST gets deallocated. This especially becomes a problem when using large ASTs in which many results are stored. Furthermore, the order of appearance of the statements is relevant. For example, the value |this.depth| needs to be reset at the appropriate place, and requires that the assignment to |menu.depth| is done before. Similarly, the increment to |this.count| needs to be positioned carefully. These are actually separate aspects that we would like to implement in isolation. However, separate pieces of code cannot easily be composed due to side effect. Finally, we need to explicitly write visits to children using |accept|. Some tools generate depth-first visitors, which alleviates the need to do so. However, such approaches come with restrictions. The restriction that all statements must take place before the invocations to children is an example. In Figure~\ref{side.fig.intro.visitor} we reset |this.depth| in between visits to children. To use a depth-first visitor, we would have to move this statements, which may not be immediately possible. Moreover, in the simple example that we showed, the two visits are invoked after each other at the root. In practice, for example in type checking languages with principal types, we actually invoke multiple visits on a subtree before moving on to the next subtree. This rules out depth-first visitors, and is also error-prone to write manually. The example in Figure~\ref{side.fig.intro.visitor} can be made more complicated by allowing menus to share submenus. The menu structure then forms an acyclic directed graph instead of a tree. With such a complication, the problems mentioned above become harder to deal with. As a sidenote, in \thiswork{}, we treat the AST as a fixed data structure. For example, we do not consider adding menu entries on the fly. The ideas we propose can deal with the dynamic construction of proof trees (Chapter~\tref{chapt.iterative}), and we think that this is sufficient to deal with dynamic changes to the AST as well, but leave this topic as future work. Below, we look for a way to generate code similar to the code above, but using a description that alleviates the programmer from the aforementioned problems. \subsection{Attribute Grammars} \label{side.example.ag} Attribute grammars take care of the problems mentioned above related to visitors, but are not flexible enough to take side effects into account. %% We briefly consider why attribute grammars appear as a promising solution, %% and why side effects form an obstacle. Before we show the example, we first give some background information on attribute grammars, and their encoding in {JavaScript}. We introduced attribute grammars in Section~\rref{intro.ag.syntax}, and we use a similar syntax here with minor differences due to the JavaScript host language and to stay close to the syntax that we introduce later with respect to RulerCore. To summarize, an attribute grammar is an extension of a context-free grammar. Nonterminals are annotated with attributes. Productions specify equations between attributes. The context-free grammar specifies the structure of the AST. Each node of the AST is associated with a production, and thus also the nonterminal of the nonterminal symbol that appears as left-hand side of the production. Each child of a node corresponds to a nonterminal symbol on the right-hand side of the production. For example, we can denote a production as well as the structure of a node in the AST using a grammar definition (explained below): \begin{code} grammar Menus -- nonterminal |Menus| prod Cons hd : Menu ^^^ tl : Menus -- production |Cons|, with two nonterminals prod Nil -- production |Nil|, empty \end{code} This grammar definition introduces a nonterminal |Menus| with two productions, representing a cons-list. The first production is named |Cons|. In BNF notation, it corresponds to |Menus -> Menu Menus|. The two nonterminals |Menu| and |Menus| in the right-hand side (RHS) have explicitly been given the respective names |hd| and |tl|. Terminals only have a name (shown later in Figure~\ref{side.fig.intro.ag}). The grammar declaration corresponds to generated {JavaScript} constructor functions in the host language, which can be used to construct ASTs. Each production is mapped to a constructor function that gets as parameter an object corresponding to the symbols in the RHS of the production. Each nonterminal is mapped to a constructor function that creates a base object that each of the objects corresponding to the productions inherits. Because of inheritance, we can verify at the point of construction that the AST matches the grammar: \begin{code} function Menus() {} -- nonterminal |Menus|: base class function Menus_Cons(hd, tl) { -- production |Cons|: subclass this.hd = hd; assert(hd instanceof Menu); this.tl = tl; assert(tl instanceof Menus); } Menus_Cons.prototype = new Menus(); Menus_Cons.prototype.constructor = Menus_Cons; function Menus_Nil() {} -- production |Nil|: subclass Menus_Nil.prototype = new Menus(); Menus_Nil.prototype.constructor = Menus_Nil; \end{code} Cons-lists occur often in AGs. As a shortcut, the following shorthand notation may be used, which specifies that the nonterminal |Menus| is a list of |Menu| nonterminals: \begin{code} grammar Menus : [Menu] \end{code} This shorthand notation has an additional benefit: the list of menus is conceptually a cons-list in the AG description but represented efficiently as a JavaScript array in the generated code. This distinction is hidden from the programmer. The evaluation of an attribute grammar constitutes to running an evaluation algorithm on each node. The algorithm is derived from the equations of the production that is associated with the node. The algorithm describes the decoration of the node with attributes. We assume that attributes are physically represented as {JavaScript} properties of the AST objects. Nodes are decorated with two types of attributes: inherited attributes are computed during evaluation of the parent of that node, and synthesized attributes are computed during evaluation of the node itself. We declare the attributes of a nonterminal using an attribute declaration: \begin{code} attr Menu inh depth -- inherited attribute syn gathMax -- synthesized attribute \end{code} These attribute names are mapped to object properties named |_inh_depth| and |_syn_gathMax|. At some point during attribute evaluation, given a participating |Menu| object |m|, the objects properties |m._inh_depth| and |m._syn_gathMax| will be defined. An inherited attribute may have the same name as a synthesized attribute: they are mapped to differently named properties. As an aside, nodes may define a number of local attributes, which can be seen as local variables. \begin{figure}[p] \vspace{-0.4em} \begin{smallcode} \begin{code} grammar Root prod Root root : Menu -- node with a child named |root| grammar Menu prod Menu name ^^^ cs : Menus -- node with a property |name|, and a child |cs| grammar Menus : [Menu] -- conceptually a cons-list, physically an array var root = new Root_Root( -- the |Menus| are physically represented new Menu_Menu("a", [ -- as an array. However, conceptually new Menu_Menu("b", [ -- we define its attributes using the new Menu_Menu("c", []) -- above cons-list representation. , new Menu_Menu("d", []) ]) ])); attr Menu Menus inh depth finMax count -- |gathMax|: width of submenu syn gathMax count -- note: |count| is both inh and syn function align(root, anchor) { -- uses embedded attribute grammars datasem Root prod Root -- equations of production |Root| of nont |Root| root:depth = 0 -- initial |depth| root:count = 0 -- initial |count| root:finMax = root:gathMax -- choose gathered max as global max datasem Menu prod Menu -- production |Menu| of nonterm |Menu| cs:depth = 1 + lhs:depth -- increase |depth| for submenus cs:count = 1 + lhs:count -- increase |count| lhs:count = cs:count -- provide the updated count to the parent loc:elem = document.getElementById(loc:name) loc:offset = lhs:depth * 20 -- indentation loc:width = loc:offset + loc:elem.clientWidth lhs:gathMax = Math.max(cs:gathMax, loc:width) cs:finMax = lhs:finMax -- pass down final maximum loc:dummy = (function () { -- side-effectful statements (wrapped) loc:elem.style.left = (anchor.offsetLeft + loc:offset) + "px"; loc:elem.style.top = (anchor.offsetTop + lhs:count * 30) + "px"; loc:elem.style.width = (lhs:finMax - loc:offset) + "px"; loc:elem.style.height = 30 + "px"; }) () -- unwrap directly datasem Menus prod Cons -- equations of production |Cons| hd:depth = lhs:depth -- pass |depth| downwards through the menus tl:depth = lhs:depth hd:count = lhs:count -- thread the |count| through the menus, in an tl:count = hd:count -- in-order fashion. First to the head, then to lhs:count = tl:count -- the tail, then back up to the parent. lhs:gathMax = Math.max(hd:gathMax, tl:gathMax) hd:finMax = lhs:finMax -- pass global maximum downwards tl:finMax = lhs:finMax datasem Menus prod Nil -- equations of production |Nil| lhs:count = lhs:count -- thread |count| through without changing it lhs:gathMax = 0 -- initial maximum var inhs = new Inh_Root(); -- contains inh attrs of the root (empty) eval_Root(sem_Root, root, inhs); } -- run the attribute evaluator \end{code} \end{smallcode} \vspace{-1.2em} \caption{Attribute grammar-based near-solution to menu alignment.} \label{side.fig.intro.ag} \end{figure} To give a semantics to these attributes, we organize equations (rules) per production in \emph{semantics-blocks}. We explain the following example below: \begin{code} datasem Menu -- nonterminal |Menu| prod Menu -- production |Menu| cs:depth = 1 + lhs:depth -- rule loc:width = 20 * lhs:width -- rule lhs:gathMax = Math.max(loc:width, cs:gathMax) -- rule \end{code} The full details of the nonterminal and its semantics can be found in Figure~\ref{side.fig.intro.ag}. The left-hand side of an equation designates an attribute. The notation for attribute occurrences |nodename:attrname| refers to an attribute |attrname| of some node |nodename|, where |nodename| is either the name of a child, or |loc| or |lhs|. The colon ensures that attribute occurrences are district from JavaScript notation for properties. Attribute occurrences in the left-hand side of a rule refer to inherited attributes of children, but a synthesized attribute of |lhs| and a local attribute in case of |loc|. Thus, the attributes we need to define appear as left-hand side. For example, the above attribute occurrences refer to the {JavaScript} properties |this.cs._inh_depth|, |this._loc_width|, and |this._syn_gathMax| respectively. Similarly, the right-hand side consists of a {JavaScript} expressions, with embedded attribute occurrences. In this case, we may refer to the synthesized attributes of children, or with |lhs| to the inherited attributes of the current node. The terminals of a production are available as local attributes. In production |Menu|, there is a terminal called |name|, which is available as attribute |loc:name|. The translation of attribute references is similar as described above. For example, the last rule expands to the {JavaScript} statement: \begin{smallcode} \begin{code} this._syn_gathMax = Math.max(this._loc_width, this.cs._syn_gathMax); \end{code} \end{smallcode} Evaluation of an attribute grammar corresponds to traversing the AST one or more times, and executing rules, according to an evaluation strategy. In \thiswork{}, we restrict ourselves to the class of well-defined attribute grammars, whose attribute dependencies can be statically proven to be acyclic~\citep{DBLP:journals/mst/Knuth68}. For these grammars, the attributes can be computed by visiting each node a bounded number of times. This corresponds precisely with typical uses of the visitor-design pattern. %% With on-demand evaluation of attributes, %% the application requests a value of a synthesized attribute of %% the root, and evaluator proceeds by first transitively %% computing all the attributes the rule depends on, potentially %% visiting subtrees multiple times. In \thiswork{}, we consider %% a greedy evaluation strategy: subtrees are visited a statically %% known bounded number of times. From a semantics-blocks (datasem-blocks in Figure~\ref{side.fig.intro.ag}), a function is generated that contains the evaluation algorithm. For example, the function |sem_Menu| is generated from the semantics of nonterminal |Menu|. Furthermore, to interface with the decorated tree in {JavaScript} code, a function |eval_Menu| is generated that takes the AST, the function |sem_Menu|, and an object containing values for the inherited attributes. It applies the semantic value and returns an object with the synthesized attributes: \begin{code} var inhs = new Inh_Menu(); inhs.depth = 0; -- provide inh attrs of root syns = eval_Menu(sem_Menu, menu, inhs); -- initiate evaluation window.alert(syns.gathMax); -- access syn attrs of root \end{code} In Figure~\ref{side.fig.intro.ag}, we show an attribute grammar version of the example that we presented earlier. It is a non-solution, for reasons explained later, but exhibits various important properties. Below, we comment on some aspects of the example. The attribute grammar code in Figure~\ref{side.fig.intro.ag} starts with a number of grammar definitions that describe the structure of the menu tree. We then define a number of attributes. In particular, the idea is that we gather a maximum |gathMax| (synthesized), and use its value at the root to pass down the global maximum |finMax| (inherited). Moreover, we count the menus. The inherited attribute |count| specifies the count for the current menu, and the synthesized |count| is the count incremented with the total number of children. We define the semantics for these attributes in the function |align|. Because |root| and |anchor| are its parameters, we also have access to these in the right-hand sides of rules. A HTML item can be laid out using statements that assign to properties of an HTML item. Since the right-hand side of an attribute equation (rule) is an expression, a sequence of statements needs to be wrapped as an expression. In {JavaScript}, this can be accomplished in a variety of ways. In the example, we choose to use a parameterless anonymous function for this purpose. In the semantics of |Menus|, rules are given to compute the attributes for lists of menus using the cons-list representation. These rules follow standard patterns. The attributes |depth| and |finMax| are passed topdown. The attribute |gathMax| is computed bottom-up. The attribute |count| is threaded through the tree. In the visitor-example, the fields in the visitor combined with side effects took care of this behavior. With attribute grammars, we have to describe it explicitly. However, with copy rules (Section~\tref{intro.sect.copyrules}), collection rules~\citep{10.1109/SCAM.2007.13}, and a generalization called default rules (Chapter~\tref{chapt.iterative}), we can abstract from these patterns, so that a more concise semantics of |Menus| can be given (as we see later). The AG code has three nice properties. Firstly, the order of appearance of the rules is irrelevant. This allows the rules for |depth| and |count| to be written separately and merged automatically~\citep{uuagc}. In the example, we give all the rules without using such composition facilities. However, for larger projects the ability to write such rules separately is important with respect to modularity. Secondly, a nice property is the absence of invocations of visits (the |accept| calls in the visitor-example). The number of visits is totally implicit. From the dependencies between attributes in the rules, it can be determined automatically that the attribute |root:gathMax| (in the semantics of |Root|) must be computed in a visit before the visit where it is passed as |root:finMax|. \begin{figure}[p] \vspace{-0.4em} \begin{smallcode} \begin{code} grammar Root prod Root root : Menu -- node with a child named |root| grammar Menu prod Menu name cs : Menus -- node with a property |name|, and a child |cs| grammar Menus : [Menu] -- conceptually a cons-list, physically an array var root = new Root_Root( -- the |Menus| are physically represented new Menu_Menu("a", [ -- as an array. However, conceptually new Menu_Menu("b", [ -- we define its attributes using the new Menu_Menu("c", []) -- above cons-list representation. , new Menu_Menu("d", []) ]) ])); itf Root visit perform -- root node has one visit, but no attrs itf Menu Menus -- itf for nonterminals Menus (menu nodes) visit gather inh depth syn gathMax -- first visit: compute maximum visit layout inh finMax count syn count -- second visit: layout the HTML items function align(root, anchor) { -- uses embedded attribute grammars datasem Root prod Root -- equations of production |Root| of |Root| root:depth = 0 -- initial |depth| root:count = 0 -- initial |count| root:finMax = root:gathMax -- global max is the gathered max here invoke layout of root -- require that visit |layout| of |root| is invoked datasem Menu prod Menu -- equations scheduled to visits of |Menu| cs:depth = 1 + lhs:depth -- increase |depth| for submenus cs:count = 1 + lhs:count -- increase |count| lhs:count = cs:count -- provide the updated count to the parent loc:offset = lhs:depth * 20 -- indentation loc:width = loc:offset + loc:elem.clientWidth lhs:gathMax = Math.max(cs:gathMax, loc:width) cs:finMax = lhs:finMax -- pass down final maximum visit gather pin loc:elem = document.getElementById(loc:name) visit layout -- equations for visit |layout| and later pin _ = (function () { -- side-effectful statements (wrapped as function) loc:elem.style.left = (anchor.offsetLeft + loc:offset) + "px"; loc:elem.style.top = (anchor.offsetTop + lhs:count * 30) + "px"; loc:elem.style.width = (lhs:finMax - loc:offset) + "px"; loc:elem.style.height = 30 + "px"; }) () -- directly call the anonymous function datasem Menus -- standard patterns for |Menus| default depth = function(depths) { return depths[depths.length-1]; } default finMax = function(maxs) { return maxs[maxs.length-1]; } default gathMax = function(maxs) { return Math.max.apply(Math, maxs); } default count = function(counts) { return counts[0]; } prod Cons -- each production must be explicitly listed, prod Nil -- even if they do not have individual rules var inhs = new Inh_Root_perform(); -- contains inh attrs for the root (empty) eval_Root(sem_Root, root, inhs); } -- run the attribute evaluator \end{code} \end{smallcode} \vspace{-1.3em} \caption{RulerCore solution to menu alignment.} \label{side.fig.intro.ruler1} \end{figure} Thirdly, we check statically if there is an evaluation order of statements such that all attributes are defined before their value is accessed. The attribute declarations describe the attributes that must be defined, and those that are available. The rules describe what attributes must be available before computing an attribute, and an evaluation order is possible if the transitive closure of the dependencies is acyclic~\citep{DBLP:journals/mst/Knuth68}. Unfortunately, when the above is evaluated on-demand it is incorrect because the order of evaluation of rules is determined is not only determined by dependencies on attributes but also by the side effects that rearrange the HTML items. Since the latter effects are not present as a dependency between rules and attributes, the order of evaluation may be wrong. In fact, the root of the tree does not have any attributes defined, so when assuming a on-demand evaluation of the grammar, it is actually expected that none of the rules are evaluated. Hence, we allow the programmer to explicitly encode the dependencies imposed by side effects in the next section. \subsection{RulerCore} \label{side.example.ruler1} We now present a solution using RulerCore in Figure~\ref{side.fig.intro.ruler1} which resembles the code in Figure~\ref{side.fig.intro.ag}. We discuss similarities and differences below. The essential difference is that RulerCore has notation to explicitly describe visits to an AST node during attribute evaluation, and notation to associate side effects with individual visits. \paragraph{Interfaces.} Instead of declaring attributes for a nonterminal, we declare an \emph{interface} for a nonterminal. An interface declaration specifies the visits of a nonterminal and attributes per visit. The following example specifies that the attributes of |Menu| are computed in two visits: \begin{code} itf Menu -- interface for nonterminal |Menu| visit gather -- declaration of first visit syn gathMax -- synthesized attr computed by visit visit layout -- declaration second visit inh finMax count -- two inherited attributes syn count -- synthesized attr computed by visit \end{code} The order of appearance of visit declarations dictates the order of visits to AST nodes with this interface. In order to visit a node, all previous visits must have occurred. Values for inherited attributes must be provided prior to the visit. Values for synthesized attributes are only available after a visit has been performed. \paragraph{Scheduling.} The rules of a semantics-block are automatically scheduled over visits using an as-late-as-possible strategy (Section~\ref{visit.sect.order}). If the rules are cyclic, the scheduling is not possible, and a static error is reported. The scheduling determines which children to visit and in what order. However, since |Root| has no attributes, there is no need to invoke any visits of |root|. Therefore, we specify through an invoke-rule that visit |layout| must be invoked, which requires through attribute dependencies that also visit |gather| must be invoked, and kickstarts the evaluation. \paragraph{Scheduling constraints.} Rules can be constrained to visits. Rules that appear in a visit-block are constraint to that visit or a later visit. The example below illustrates the various possibilities. An attribute definition that is prefixed with the keyword |pin| is restricted to exactly the visit that it appears in, and is executed during that visit even where there are no value dependencies on the attributes that it defines: \begin{code} datasem Menu -- rules for nonterminal |Menu| prod Menu -- rules for production |Menu| cs:count = lhs:count + 1 -- scheduled in visit |gather| or later visit gather pin loc:elem = ... -- precisely in visit |gather| visit layout -- rules for visit |layout| or later pin _ = ... -- precisely in visit |layout| lhs:count = cs:count -- constrained to layout or later \end{code} With an underscore, we bind the value of the RHS of a rule to an anonymous attribute that we cannot refer to anymore. A visit-block may contain rules and optionally either a nested visit-block or a nested clause-block. We use and explain clause-blocks later. A visit-block introduces a subscope. A local attribute defined in a visit-block is not available for a rule defined in a higher scope, even if that rule is scheduled to a subscope. Attributes of children are available to higher scopes. After all these preparations, we finally present the RulerCore solution in Figure~\ref{side.fig.intro.ruler1}. In this example, we express that the side effects that query the widths of the HTML items are constrained to the first visit, and that the side effects that change the locations and dimensions are constrained to the second visit. For the |Menus|-nonterminal, we give default-rules for equality named attributes in its productions. If such an attribute (e.g. |sub k i:a|) does not have an explicit definition, it is implicitly defined by the default rule. Associated with the children in a production is their order of appearance. The default-rule provides a function which receives a list (an array in JavaScript) as argument that contains the values of the attributes |a| of the children in the production and preserving the order of the children. Children without an attribute |a| do not have a value of this list. Also, the value of inherited |lhs:a| is added to the end of the list if it exists. \paragraph{Remarks.} In the above example, we combined both side effects and attribute evaluation. We retain the advantages that AGs offer, such as the ease of adding attributes. As we show later, the description still permits the AG to be analyzed and the rules to be ordered. However, we require the programmer to manually assign attributes to visits, and to constrain side-effectful rules to particular visits, which is not necessary for conventional attribute grammars. In practice, this is only a minimal amount of extra work that has as an additional advantage that it makes attribute evaluation more predictable and thus easier to understand. %% as well as simplifies cycle checks, both in terms of computational complexity %% and understandability of error messages. \begin{figure}[p] \begin{smallcode} \begin{code} function align(root, anchor) { -- uses embedded attribute grammars var sem_Root = -- semantic function with itf |Root| sem prodRoot : Root -- equations for itf |Root| visit perform -- equations for the |perform|, the only visit clause Root -- production named |Root| child root : Menu = sem_Menu -- introduce a child |root| of nonterm |Menu| root:ast = lhs:ast -- use |lhs:ast| as AST root:depth = 0 -- initial |depth| root:count = 0 -- initial |count| root:finMax = root:gathMax -- global max is the gathered max of here invoke layout of root -- demand invocation |layout| of |root| var sem_Menu = -- semantic function with itf |Menu| sem prodMenu : Menu -- equations for itf |Menu| visit gather -- equations for first visit clause Menu -- production named |Menu| child cs : Menus = sem_Menus -- introduce a child |cs| of nonterm |Menus| cs:ast = lhs:ast.cs -- pass submenus as AST for |cs| cs:depth = 1 + lhs:depth -- increase |depth| for submenus pin loc:elem = document.getElementById(loc.name) loc:offset = lhs:depth * 20 -- indentation loc:width = loc:offset + loc:elem.clientWidth lhs:gathMax = Math.max(cs:gathMax, loc:width) cs:finMax = lhs:finMax -- pass down global maximum visit layout -- equations for visit |layout| clause Menu' -- subproduction named |Menu'| cs:count = 1 + lhs:count -- increase |count| lhs:count = cs:count -- provide the updated count to the parent pin _ = (function () { -- side-effectful statements loc:elem.style.left = (anchor.offsetLeft + loc:offset) + "px"; loc:elem.style.top = (anchor.offsetTop + lhs:count * 30) + "px"; loc:elem.style.width = (lhs:finMax - loc:offset) + "px"; loc:elem.style.height = 30 + "px"; }) () -- directly call the anonymous function ... -- See Figure~\ref{side.fig.intro.ruler22} var inhs = new Inh_Root_perform(); -- contains inh attrs for the root inhs.ast = root -- AST as inherited attribute eval_Root_perform(sem_Root, inhs); -- run the attribute evaluator } \end{code} \end{smallcode} \caption{Desugared RulerCore solution to menu alignment (part 1).} \label{side.fig.intro.ruler2} \end{figure} \subsection{Desugared RulerCore} \label{side.example.ruler2} In Figure~\ref{side.fig.intro.ruler2} (explained below), we give a different \emph{desugared} of Figure~\ref{side.fig.intro.ruler1}. Both versions are valid RulerCore programs. This desugared version only uses a subset of RulerCore, which we call \emph{RulerBack}. This representation is more verbose, but more suitable for code generation. RulerBack generalizes over higher-order~\citep{DBLP:conf/pldi/VogtSK89} and conditional~\citep{DBLP:journals/toplas/Boyland96} attribute grammars. In the next section, we introduce RulerBack. The example in Figure~\ref{side.fig.intro.ruler2} serves as preparation. In Figure~\ref{side.fig.intro.ruler2}, we omit the grammar definitions, interface declaration, and |root| variable, which are equal to those in the first half of Figure~\ref{side.fig.intro.ruler1}. In an attribute grammar, there is a fixed association between a node in the AST and a production and a fixed association between a production and a collection of rules. The code to execute for a node in the AST is derived from the associated collection of rules. RulerBack \emph{virtualizes} productions: we define grammars that describe traversals instead of data structures (Section~\rref{sect.intro.hochildrencopyrules}). The rules of a RulerBack production are organized in \emph{clauses} (introduced below), and rules can programmatically determine which clauses to evaluate. The above functionality allows us to define a single production per nonterminal. The nonterminal has an inherited attribute |ast| which contains the AST as an inspectable value. Note that in RulerBack the representation of cons-lists using arrays becomes visible whereas this is hidden in the RulerCore example. In the translation from RulerCore to RulerBack additional RulerBack rules are generated to treat the explicit array representation. We show this in the example. \emph{Semantics blocks}, which are of the form |sem P : N ...|, introduce a production |P| of nonterminal |N|. The visits and attributes of |N| are declared separately with an \emph{interface declaration}. Additionally, the code generated from a sem-block is a constructor-function that produces an AST node with attributes as described by |N|. The AST is provided explicitly as the inherited attribute |ast|. In Figure~\ref{side.fig.intro.ruler2}, we start with a definition of the semantics for the root. The interface |Root| declares one visit, and we give rules for that visit in a visit-block. RulerBack provides clauses as a means to generalize over productions. Each clause provides a way to compute the attribute values of a visit. Moreover, a clause may specify constraints. Clauses are executed in the order of appearance. A clause is selected if its constraints are satisfied. Conventional productions, which specify a constraint on the node of the tree, can thus be represented with clauses. \begin{figure}[t] \begin{smallcode} \begin{code} function align(root, anchor) { -- uses embedded attribute grammars ... -- See Figure~\ref{side.fig.intro.ruler2} var sem_Menus = -- semantic function, also itf |Menu| sem prodMenus : Menu -- equations for itf |Menu| visit gather -- equations for visit |gather| default depth = function(depths) { return depths[depths.length-1]; } default finMax = function(maxs) { return maxs[maxs.length-1]; } default gathMax = function(maxs) { return Math.max.apply(Math, maxs); } default count = function(counts) { return counts[0]; } clause Cons -- production |Cons| as clause match true = lhs:ast.length >= 1 -- clause matches if array has an element child hd : Menu = sem_Menu -- introduce child |hd| using |sem_Menu| hd.ast = lhs:ast[0] -- head of the array child tl : Menu = sem_Menus -- introduce child |tl| using |sem_Menus| tl.ast = lhs:ast.slice(1) -- tail of the array clause Nil -- production |Nil| (matches always) var inhs = new Inh_Root_perform(); -- contains inh attrs for the root inhs.ast = root -- AST as inherited attribute eval_Root_perform(sem_Root, inhs); } -- run the attribute evaluator \end{code} \end{smallcode} \caption{Desugared RulerCore solution to menu alignment (part 2).} \label{side.fig.intro.ruler22} \end{figure} Clauses and visit-blocks may contain rules. Rules given for a visit are in scope of all clauses declared for that visit. Rules for a clause are only visible in that clause. We introduce child-rules. A child-rule introduces child. In the example, we introduce a child |root|, with interface |Menu|, and the semantics defined by the {JavaScript} value |sem_Menu|. This is an example of a higher-order child (Section~\rref{intro.sect.higherorder}) and is used to virtualize the AST. Unlike in later chapters, in this chapter the virtual AST is isomorphic to the actual AST. Also, we assume that a visit |v| to child |x| is only possible if there exists an invoke-rule for it. \begin{figure}[p] \begin{code} e ::= JS [ many b ] -- embedded RulerBack blocks |b| in JavaScript code |JS| b ::= i | s | o -- RulerBack blocks i ::= itf I v -- interface decl, with visit sequence |v| v ::= visit x inh (many x1) syn (many x2) z v -- visit decl, with atributes |x1| and |x2| and strategy |z| | terminator -- terminator visit (optional in the notation) s ::= sem x : I t -- semantics expr, defines production |x| t ::= visit x (many r) (many k) -- visit-block, with rules |r| and clauses |many k| | terminator -- no visit (serves as terminator) k ::= clause x (many r) t -- clause definition, with next visit |t| r ::= p = e -- assert-rule, evaluates |e|, binds to pattern |p| | pin p = e -- pinned assert rule (bound to visit it occurs in) | match p = e -- match-rule, backtracking variant | invoke x of (many c) z -- invoke-rule, invokes visit |x| on |many c| | child c : I = e -- child-rule, introduces a child |c| of itf |I| z ::= partial | total -- behavior in case of rule failure o ::= c:x -- attribute reference in some embedded code p ::= c:x -- attribute reference in pattern | _ -- wildcard | K -- constant |K| x, c p, e -- identifiers, child identifiers, patterns, expressions respectively Gamma,Sigma ::= epsilon -- attr+child environment (used in semantics) | Gamma, scope -- new scope | Gamma, inh c:x -- inh attr |c:x| | Gamma, syn c:x -- syn attr |c:x| | Gamma, c:I v -- child |c| with available visit sequence |v| Phi ::= epsilon -- interface environment (used in semantics) | Phi, I v -- itf |I| with visit decls |v| \end{code} \caption{Syntax of RulerBack} \label{side.fig.syntax.rulercore} \end{figure} The left-hand sides of an evaluation-rule may be a pattern. This is either an attribute reference, an underscore, or a constant. Evaluation of such a rule fails when its execution throws an exception or the left-hand side is a value that is not equal to the value computed for the right-hand side. Such a failing rule causes an exceptional termination of the evaluation, unless the evaluation-rule is prefixed with the match-keyword, and the rule does not throw an exception other than a special fail-exception. In this case, we say that the clause \emph{fails}. Thus, the match-rules allow us to distinguish clauses |Cons| and |Nil| of |ntMenus| by matching on the length of the list. Invoke rules and visit-blocks may be annotated with strategies of various kinds. Chapter~\rref{chapt.rulercore} describes these strategies: in this chapter the strategies |partial| and |total|, which describe backtracking behavior, appear. During attribute evaluation, the clauses of a visit are evaluated in the order of appearance. When evaluation for a clause fails, the evaluation \emph{backtracks} to the next clause\footnote{ Chapter~\tref{chapt.breadthfirst} describes a different strategy where clauses are simultaneously tried. }. A backtrack does not revert potential side effects of the results that are evaluated so far. If the last clause fails, the default behavior is that the evaluation fails exceptionally. However, if both the visit itself and the invoke-rule of the parent are annotated with the strategy |partial|, then the invoke-rule of the parent fails and causes backtracking in the parent. By default, a total strategy is assumed. Unspecified visit-blocks are implicitly defined as an empty visit-block. A visit-block without clauses implicitly has a single clause. This clause matches always unless match-rules are present. Therefore, we neither have to specify the visit-block |layout| nor clauses for it in the semantics of |ntMenus|. Also, because of the automatic ordering of rules, many of the rules defined in visit |layout| of |ntMenu|, could also be defined one level higher, in visit |gather|. Note that this representation is more general than conventional attribute grammars, and that an attribute grammar can easily be mapped to this representation, as shown by the difference between Figure~\ref{side.fig.intro.ruler1} and Figure~\ref{side.fig.intro.ruler2}. %format SumNil2 %format SumCons2 \begin{figure}[t] \begin{code} itf S visit v1 inh l syn emptyset total -- decompose array |l| down visit v2 inh emptyset syn s total -- compute sum |s| up terminator -- end of visit decls var sumArr = sem sum : S visit v1 emptyset -- first visit clause sumNil -- when list is empty match 0 = lhs:l.length -- match empty |l| visit v2 emptyset -- second visit clause sumNil2 -- single clause lhs:s = 0 -- empty list, zero sum terminator -- end of visit blocks clause sumCons -- when list non-empty loc:x = lhs:l[0] -- head of the list loc:xs = lhs:l.slice(1) -- tail of the list child tl : S = sumArr -- recursive call tl:l = loc:xs -- |l| param of call invoke v1 of tl total -- invoke on child visit v2 emptyset -- second visit clause sumCons2 -- single clause invoke v2 of tl total -- invoke on child lhs:s = loc:x + tl:s -- sum of head and tail terminator -- end of visit blocks \end{code} \caption{Example of RulerBack syntax: summing an array of integers.} \label{side.fig.syntax.example} \end{figure} \section{Static Semantics of RulerBack} \label{side.sem.rulercore} In this section, we introduce RulerBack, a small subset of RulerCore. It serves as an intermediate language for RulerCore. Figure~\ref{side.fig.syntax.rulercore} shows the syntax of RulerBack. A RulerBack program |e| is a {JavaScript} program |JS|, with embedded RulerBack blocks |b|. A block |b| is either an interface declaration, semantics-block, or attribute reference. The syntax of visits in interface declarations and semantics-blocks use a cons-list representation which is convenient for the specification of translation schemes later. We explain the individual forms of syntax in more detail below. There are some essential differences with respect to RulerCore that we gradually introduced in the previous section. The order of appearance of rules defines the evaluation order, and each invocation of a visit must explicitly be stated through an invoke rule. Grammar blocks can be desugared and are optional in RulerBack. Instead, with clauses and (match) rules, we provide a general mechanism to traverse arbitrary {JavaScript} data structures. The embedded blocks may occur anywhere in a {JavaScript} program. The programmer is required to position semantics blocks and attribute references at expression positions in the host language, and interface declarations at statement positions. It is the responsibility of the programmer to handle the scoping of embedded blocks. Figure~\ref{side.fig.syntax.example} shows a RulerBack program that computes the sum of an array of integers in two visits. This simple example can also be formulated as a single visit. However, it serves here as a short example of a dual-visit program. The first visit has two clauses: a clause |sumNil| when the array is empty, and |sumCons| when there is at least one element. In the second visit, the actual sum is computed, using the rules that depend on which clause is chosen in the first visit. A semantics-block introduces a visitor-object with an interface |I|. The interface dictates what visits can be made to the object, and what the inputs (inherited attributes) and outputs are (synthesized attributes). The outputs for a visit are produced by executing rules. We write these rules down in a tree of clauses and visits, as illustrated by the indentation in Figure~\ref{side.fig.syntax.example} and the state diagram in Figure~\ref{side.fig.diagram}. \begin{figure}[htp] %format sumNil2 %format sumCons2 \begin{center} \begin{tikzpicture} [ grow=right , visit/.style={circle, minimum size=1mm, node distance=2mm, very thick, draw=black, fill=black,font=\scriptsize} , choice/.style={circle, minimum size=1mm, node distance=2mm, very thick, draw=black, fill=white,font=\scriptsize} ] \node[visit,label=90:|v1|]{} child { node[choice, label={[xshift=-4mm]120:|emptyset|}, label={[xshift=-4mm]60:|sumNil|}, label={[xshift=-6mm]-60:|sumCons|}]{} [level distance=25mm] child { node[visit,label=90:|v2|]{} [node distance=20mm] child { node[choice, label={[xshift=-7mm]120:|emptyset|}, label={[xshift=1mm]60:|sumCons2|}]{} child { node[visit,label=90:|()|]{} } } } child { node[visit,label=90:|v2|]{} child { node[choice, label={[xshift=-7mm]120:|emptyset|}, label={[xshift=1mm]60:|sumNil2|}]{} child { node[visit,label=90:|()|]{} } } } }; \end{tikzpicture} \end{center} \caption{States of nodes with semantics |sum|.} \label{side.fig.diagram} \end{figure} The black nodes represent the state of the AST-node prior to a visit and the white nodes indicate a branch point. Upon creation, an AST node is in the state represented by the root node. With each edge, alternatively the rules of a visit or the rules of a clause are associated. With each visit, an AST node changes state to a next black node by executing the rules on the path to such a node. Execution of all of the rules must succeed. At a branch-point, rules on edges of clauses are tried in order of appearance. Results produced by executing rules are in scope of rules further along the path. \begin{figure}[htp] %if fullVersion \begin{smallcode} \begin{code} Gamma :- s ^^^ ^^^ ^^^ ^^^ v sep Gamma :- t ^^^ ^^^ ^^^ ^^^ Sigma sep Gamma0 :- r ::: Gamma1 ^^^ ^^^ ^^^ ^^^ Gamma :- e -- signatures of the relations Gamma :- o ^^^ ^^^ ^^^ ^^^ v sep (many x) sep Gamma :- c ^^^ ^^^ ^^^ ^^^ Sigma sep Gamma0 :- p ::: Gamma1 ^^^ ^^^ ^^^ ^^^ -- used by judgments below \end{code} \end{smallcode} %endif \small \begin{mathpar} %if fullVersion \inferrule*[right=sem] { |x unique| \\\\ |I v `elem` Phi| \\ |v sep Gamma, scope :- t| \\ } {|Gamma :- sem x : I t|} %endif \inferrule*[right=end] {} {|terminator sep Gamma :- terminator|} \inferrule*[right=visit] { |Gamma0 `union` avail (visit x (many r) (many k)) sep Gamma0 `union` { inh lhs:a bar a `elem` (many i) } :- (many r) ::: Gamma1| \\ |many (v sep (many s) sep Gamma1 :- k)| \\ } {|visit x inh (many i) syn (many s) v sep Gamma0 :- visit x (many r) (many k)|} \inferrule*[right=clause] { |x unique| \\ |Gamma0 `union` avail (clause x (many r) (many k)) sep Gamma0 :- (many r) ::: Gamma1| \\ |v sep Gamma1 :- t| \\ |{ syn lhs:a bar a `elem` (many s)} `subset` Gamma1| \\ } {|v sep (many s) sep Gamma0 :- clause x (many r) t|} \inferrule*[right=assert] { |Sigma sep Gamma0 :- p ::: Gamma1| \\ |Gamma0 :- e| \\ } { |Sigma sep Gamma0 :- opt pin p = e ::: Gamma1| } \inferrule*[right=match] { |Sigma sep Gamma0 :- p ::: Gamma1| \\ |Gamma0 :- e| \\ } { |Sigma sep Gamma0 :- match p = e ::: Gamma1| } \inferrule*[right=invoke] { |Phi((sub I c)) = (many v)| \\ |visit x inh (many i) syn (many s) z2 `elem` (many v)| \\ |z1 `smaller` z2| \\ |c : (sub I c) (many w) `elem` Gamma0| \\ %% check child |next (many w) (many v) = x| \\ %% check invoke |{ inh c:a bar a `elem` (many i) } `subset` Gamma0| \\ %% check inhs |Gamma1 = Gamma0 `union` { syn c:a bar a `elem` (many s) } `union` { c : (sub I c) ((many w), visit x inh (many i) syn (many s)) } | \\ %% add syn + visit } { |Sigma sep Gamma0 :- invoke x of c z1 ::: Gamma1| } \inferrule*[right=child] { |Gamma0 :- e| \\ |Gamma1 = Gamma0 `union` { c : I emptyset }| } { |Sigma sep Gamma0 :- child c : I = e ::: Gamma1| } \inferrule*[right=occ.lhs] { |inh lhs:a `elem` Gamma| } { |Gamma :- lhs : a | } \inferrule*[right=occ.child] { |syn c:a `elem` Gamma| } { |Gamma :- c : a| } \inferrule*[right=pat.lhs] { |syn lhs:a `elem` Sigma| \\ } { |Sigma sep Gamma0 :- lhs : a ::: Gamma0, syn lhs:a| } \inferrule*[right=pat.loc] {} { |Sigma sep Gamma0 :- loc : a ::: Gamma0, syn loc:a| } \inferrule*[right=pat.child] { |inh c:a `elem` Sigma| \\ } { |Sigma sep Gamma0 :- c : a ::: Gamma0, inh c:a| } \inferrule*[right=const] {} { |Sigma sep Gamma :- K ::: Gamma| } \inferrule*[right=any] {} { |Sigma sep Gamma :- _ ::: Gamma| } \end{mathpar} \begin{smallcode} \begin{code} avail (visit x (many r) (many k)) = (sub avail union) ((many r)) `union` (sub avail intersection) ((many k)) `union` { syn lhs:b | visit x inh (many a) syn (many b) `elem` Phi((sub I x)) } avail (clause x (many r) t) = (sub avail union) ((many r)) `union` avail (t) avail (p = e) = emptyset avail (match p = e) = emptyset avail (invoke x of c) = { inh c:a | a `elem` (many a), visit x inh (many a) syn (many b) `elem` Phi((sub I c)) } avail (child c : I = e) = { c : I (Phi I) } \end{code} \end{smallcode} \caption{Static semantics of RulerBack} \label{side.fig.sem.rulercore} \end{figure} There are four types of rules in RulerCore. \begin{itemize} \item \begin{code} match p = e -- match-rule match loc:x = 3 -- example that succeeds match true = false -- example that fails \end{code} The pattern |p| must match the value of the right hand side. If the evaluation of |e| results in an exception, or the match fails, a backtrack is made to the next clause. If |p| represents an attribute, the attribute gets defined. \item \begin{code} opt pin p = e -- eval-rule (optionally pinned) \end{code} Similar to the above, except that the match is expected to succeed. If not, the evaluation itself aborts with an exception. For that reason, we also call these rules assert-rules. Note that these are the conventional rules of attribute grammars. \item \begin{code} child c : I = e -- child-rule child root : Menu = ntMenu -- example that introduces a |Menu| child \end{code} Evaluation of the rule above creates a child |c|, visitable according to the interface |I|, and created by executing the constructor function |e|. \item \begin{code} invoke x of c z -- invoke rule \end{code} Executes visit |x| of child |c|. The inherited attributes of |x| must be defined, and all prior visits to |c| must have been performed. The invocation fails if no clause matches and the strategy |z| is |partial|. Otherwise, the evaluation aborts exceptionally. If successful, the synthesized attributes of |x| become available. If there is an invoke with a partial-annotation, then the visit of the corresponding interface must also have a partial-annotation. \end{itemize} Figure~\ref{side.fig.sem.rulercore} shows RulerBack's static semantics for sem-blocks. We omitted the rules that state the uniqueness of interfaces and attributes of interfaces. A RulerBack program that satisfies these conditions never crashes due to an undefined attribute, invalid rule order, or forgotten invocation to a child. The dynamic or static type checking we leave as responsibility of the host language. We briefly consider some aspects of these rules. Three environments play an important role. The environment |Phi| contains for each interface the sequence of visits. The environment |Gamma| represents the children and attributes defined so far for one node (to test for missing and duplicated definitions). The environment |Sigma| represents the attributes that are allowed to be defined (to test for definitions of unknown attributes). As additional constraint on environments, we consider it a static error when there is a duplicate attribute in the environment within two scope markers. Visit-blocks must be specified in the order as declared on the interface, and none may be omitted. The relation for visits |t| gets a sequence of pending visits |many v| as declared in the interface. In rule \TirName{visit}, we verify that the name of the visit matches the expected visit in the head of |many v|. The next visit must match the head of the tail of this list, until in the end |many v| is empty. We also add the inherited attributes of the visits to the environment. The function |avail| defines which attributes may be defined. Higher-up in the visit-clauses-tree, we may only define those attributes that are common to all lower clauses. In rules \TirName{pat.lhs} and \TirName{pat.child}, we verify that we are indeed defining an attribute belonging to a certain child. The avail-function either generalizes over lists using intersection or union. Rule \TirName{sem} forms the root of derivation trees. Since semantics blocks can occur nested, |Gamma| contains potential bindings in scope from encapsulating semantics blocks. A new scope entry is added to |Gamma|. The rule matches the visits blocks |t| against the declared visits |v|. Rule \TirName{end} matches with the end of a sequence of visits blocks, and requires also to have reached the end of the declared visits. Rule \TirName{visit} requires that the clauses and rules are well-formed. From the rules |many r| it obtains an environment |Gamma1| with additional bindings that are available to the clauses. Both the rules and the clauses may refer to the inherited attributes of the visit. Each clause must define the set |many s| of synthesized attributes of the visit. We assume that the type rule for a rule |r| is lifted to a list of rules |many r| by chaining the environment |Gamma|. Rule \TirName{clause} verifies that the rules it contains and the next visit are well-formed. Furthermore, it verifies that the synthesized attributes of the visit are defined. The type rules \TirName{assert}, \TirName{match}, \TirName{invoke}, and \TirName{child} correspond to RulerBack rules. The \TirName{assert} and \TirName{match} rules verify that the pattern and expression are well-formed. The \TirName{child} rule adds the child in an empty state to the environment. Rule \TirName{invoke} verifies that visit |x| is indeed the next visit in the expected sequence of visits |many v|, given the previous invocations |many w|. It furthermore verifies that the inherited attributes for the visit of |c| are defined, and adds the synthesized attributes to the environment. Finally, the strategy annotation of the invoke-rule must be greater than the strategy annotation of the visit. If the visit is declared as total, then as sanity-check, an invoke may not have |partial| as strategy. \begin{figure}[p] \begin{smallcode} \begin{code} var sumArr = function() { -- semantic function function nt_sum(_inps) { -- visit |v1| var lhsIl = _inps.l; -- extract |lhs:l| try { -- try clause |sumNil| if (lhsIl.length != 0) throw eEval; -- if |lhs:l| is empty var _res = new Object(); -- produce results of |v1| _res._next = function(_inps) { -- cont. for visit |v2| var lhsSs = 0; -- |lhs:s| rule var _res = new Object(); -- produce results of |v2| _res._next = null; -- no next visit _res.s = lhsSs; -- store |lhs:s| return _res; -- return result of |v2| }; return _res; -- return result of |v1| } catch(err) { -- try clause |sumCons| var locLx = lhsIl[0]; -- |loc:x| rule var locLxs = lhsIl.slice(1); -- |loc:xs| rule var vis_tl = sumArr(); -- creation of child |tl| tlIl = locLxs; -- |tl:l| rule var _args = new Object(); -- inputs for |v1| of |tl| _args.l = tlIl; -- store |tl:l| var _res = vis_tl(_args); -- invoke |v1| of |tl| var vis_tl = _res._next; -- extract results var _res = new Object(); -- produce results of |v1| _res._next = function(_inps) { -- cont. for visit |v2| var _args = new Object(); -- inputs for |v2| of |tl| var _res = vis_tl(_args); -- invoke |v2| of |tl| var tlSs = _res.s; -- extract |tl:s| result var lhsSs = locLx + tlSs; -- compute |lhs:s| var _res = new Object(); -- produce results of |v1| _res._next = null; -- no next visit _res.s = lhsSs; -- store |lhs:s| return _res; -- return result of |v2| }; return _res; -- return result of |v1| } }; return nt_sum; }; -- return visitor function \end{code} \end{smallcode} \caption{Example translation} \label{side.fig.sem.example} \end{figure} \section{Translation of RulerBack to {JavaScript}} \label{side.trans.rulercore} In this section, we describe how to translate RulerBack programs to {JavaScript}. We translate each semantics-block to a coroutine, which we implement as one-shot continuations. Each call to the coroutine represents a visit. The parameters of the coroutine are the inherited attributes of the visit. The result of the call is an object containing values for the synthesized attributes, and the continuation to call for the visit. As an example, we show in Figure~\ref{side.fig.sem.example} the translation of the example in the previous section. To deal with backtracking, we use the exception mechanism, and throw an exception to switch to the next clause. Note that this does not rollback any side effects that the partial execution of the rules may have caused. To be able to do so, we can run the rules in a software transaction~\citep{Heidegger10}, which are nowadays supported by many programming languages. Alternatively, when the side effects matter, the programmer can schedule the rule to an earlier or later visit, such that it is not influenced by backtracking. To deal with continuations, we use closures. The function to be used for the next visit is constructed in the previous visit. This function has access to all the results computed in the previous visit. Furthermore, we store values for attributes in local variables. Those values that are not needed anymore are automatically cleaned up by the garbage collector. Figure~\ref{side.fig.sem.denot} shows the general translation scheme, and the naming scheme for attributes. In particular, for each visit, we generate a closure that takes values for inherited attributes as parameter. Clauses are dealt with through exception handling. When a clause successfully executed all statements, it returns an object containing values for synthesized attributes, as well as the continuation function for the next visit. The above translation is relatively straightforward. In practice, the selection of a clause is functionally dependent on the value of an inherited attribute, or a local attribute computed in a previous visit. In those cases, the selection of clauses can be implemented more efficiently using conventional branching mechanisms. Also, instead of using the exception mechanism to implement backtracking, we can use code duplication, which results in more efficient code. Chapter~\tref{chapt.dependent-ags} shows such a translation scheme. We verified that the above implementation runs in time linear in the size of the tree when we use a version of the |slice| operation that does not make a copy of the array. With a throughput of about hundred array elements per microsecond, and about a thousand per microsecond with the exception handling replaced by conventional branching, this is still about one or two orders of magnitude slower than using a hand-written loop. In our experience, however, the traversal performance is rarely an issue. In general, the asymptotic complexity of the traversal is linear in the size of the tree, and the actual time taken by traversing the trees is insignificant compared to the work performed by the right-hand sides of the rules in a real application. \begin{figure}[p] \begin{smallcode} \begin{code} (semb (sem x : I t)) ^^^ ~> ^^^ function() { var (semb (nt x)) = (semt I t); return (semb (nt x)); } (semb (c:x)) ~> (semb (inp c x)) (semt I (epsilon)) ~> null (semt I (visit x (many r) (many k) z)) ~> function(_inps) { (semb (inp lhs (inhs I x))) = _inps.(semb (inhs I x)); (semb (many r)); (semc I (z, syns I x) (many k)); } (semc I (partial, many s) []) ~> throw eEval; (semc I (total, many s) []) ~> throw eAbort; (semc I (z, many s) (k : (many k))) ~> try { (semc I (many s) k); } catch(err) { if (err == eEval) { (semc I (z, many s) (many k)); } else throw err; } (semc I (many s) (clause x (many r) t)) ^^^ ~> (semb (many r)); var _outs = new Object(); _outs._next = (semt I t); _outs.(many s) = (semb (out lhs (many s))); return _outs; (semb (opt pin p = e)) ~> (semz total (var _res = (semb e))); (semp eAbort p) (semb (match p = e)) ~> var _res = (semb e); (semp eEval p); (semb (child c : I = e)) ~> var (semb (vis c)) = ((semb e))(); (semb (invoke x of c z)) ~> var _args = new Object(); _args.(semb (inhs (sub I c) x)) = (semb (out c (inhs (sub I c) x))); (semz (z) (var _res = (semb (vis c))(_args))); var (semb (inp c (syns (sub I c) x))) = _res.(semb (syns (sub I c) x)); var (semb (vis c)) = _res._next; (semz partial e) ~> e (semz total e) ~> try { e; } catch (err) { if (err == eEval) throw eAbort; else throw err; } (semp e (c:a)) ~> var (semb (out c a)) = _res; (semp e _) ~> ; (semp e k) ~> if (_res != k) throw e; \end{code} \begin{code} out "loc" x = "locL" x ^^^^ inp "loc" x = "locI" x out "lhs" x = "lhsS" x ^^^^ inp "lhs" x = "lhsI" x out c x = c "I" x ^^^^ inp c x = c "S" x vis c = "vis_" c ^^^^ nt x = "nt_" x syns I x ^^^^ ^^^^ inhs I x -- respectively, inh and syn attrs of |x| of |I| \end{code} \end{smallcode} \caption{Denotational semantics of RulerBack} \label{side.fig.sem.denot} \end{figure} %if fullVersion \section{Translation of RulerCore to RulerBack} \label{side.trans.rulerfront} In Section~\ref{side.example.ruler2}, we showed in an example how a RulerCore program can be encoded using only syntax of RulerBack. We omit the data-type driven translation from a |datasem| into a |sem|, nor the translation of default-rules. Instead, in this section we assume that RulerCore consists of those programs that after insertion of invoke-rules and reordering of rules are a valid RulerBack program. \subsection{Implicit Invocations} In RulerCore, invoke-rules may be omitted. From a RulerBack program, we derive a number of implicit invocations. We first determine the attributes that are \emph{needed}. From these we determine the maximum needed visit, and thus the sequence of visits that is needed. An invoke-rule needs to be inserted if there is no invoke-rule for any of these visits yet. We start the insertion-process at the root of the tree, and check at each level downwards which invokes need to be inserted. With this process, we insert the invoke-rules at the lowest point, while still being in scope of all rules that need it. Automatic rule ordering then positions the invokes at their appropriate places. A synthesized attribute |a| of child |c| is needed if there exists a rule which has the attribute reference |c:a| in its right-hand side. The needed attributes may differ per clause and visit, which we define in a similar way as |avail| in Section~\ref{side.sem.rulercore}: \begin{code} need (visit x (many r) (many k)) = (sub need union) (many r) `union` (sub need intersection) (many k) need (clause x (many r) t) = (sub need union) (many r) `union` need t need (p = e) = need e \end{code} Note that |need| generalizes to lists by using either intersection or union, which we denoted explicitly with |sub need union| and |sub need intersection|. In our actual implementation, we defined the function |need| in a slightly more subtle way. A default-rule may indirectly express a need on an attribute (and corresponding visit). Furthermore, when a programmer provided an explicit invoke rule for a visit of a child |c|, then the programmer must give explicit invoke rules in all clauses that require attributes of this visit of |c|. This is a policy that we impose, because apparently the programmer had a reason to explicitly specify an invocation of the visit, instead of using the implicit specification. \subsection{Rule Ordering} \label{visit.sect.order} To order the rules, we first create production dependency graphs (PDGs) (Section~\tref{sect.intro.ag.eval}). Note that we do not need nonterminal dependency graphs, because these are fully implied by the interface of a nonterminal. In comparison to conventional PDGs, the PDGs of RulerCore contain also vertices that represent visits, clauses, and invocations of visits. The graph is also slightly less complex because rules do not depend on synthesized attributes of children, but on the visits of the children that produce these attributes. \begin{figure}[p] \begin{center} \small{\begin{tabular}{ll} source node & destination node \\ \hline begin visit & end of previous visit \\ end visit & begin visit, clauses, syn attr def. rules, match and pin rules \\ clause & begin visit \\ any rule & begin visit or clause \\ invoke rule & prev invoke or child, inh attr def. rules \\ any rule w. rhs & begin visit for inh attrs, loc attrs def., invoke of attr \\ \end{tabular}} \end{center} \begin{center} %% \includegraphics[scale=0.25]{ordered-ags/sum.pdf} \includegraphics[scale=0.42]{ordered-ags/menu.pdf} \end{center} \caption{Dependencies between RulerCore entities, and the dependency graph of |Menu|.} \label{side.fig.dependencies} \end{figure} In the PDG of a production (thus, a semantics block), there exists a begin and end vertex for each visit in the interface. There exists also a vertex for each clause and each rule. Figure~\ref{side.fig.dependencies} lists the dependencies between vertices, and gives an impression of the dependency graph of |Menu| in Figure~\ref{side.fig.intro.ruler1}. The dependency graph is acyclic, thus the rules can be ordered. In this sketch, the ovals represent rules, and the square boxes represent the begin and end of visits and clauses. The numbers represent the line numbers\footnote{ See~\url{https://svn.science.uu.nl/repos/project.ruler.systems/ruler-core/examples/JsMenu.rul}. }. The squares immediately following the root are the begin points of the clauses for that visit. Clauses are constrained by begin and end points of visits. Therefore, branches come together again. Some rules are constrained to visits (notably match rules). For other rules, we have more flexibility in their scheduling. For |Menu|, we did not define clauses for the second visits, hence the implicit clause in the graph. %endif %if fullVersion Note that we consider the edges in the direction of their dependency, not in the direction of the value flow. Thus, |p| is a successor of |q| if |p| needs to be defined before |q|. To make the distinction clearer, we refer to the direct and indirect successors as \emph{dependencies} and direct and indirect predecessors as \emph{users}. The acyclic graph represents a partial order between rules, visits and clauses. We turn the partial order in a total order in a number of preprocessing steps. Any total order that satisfies the partial order must result in a semantically equivalent result. However, differences between total orders may affect the performance of the resulting algorithm. In the dependency graph, each rule is associated with a unique vertex. \paragraph{Step 1: order by visit.} First, we determine for each rule to which visit to schedule it. Sometimes, rules can be scheduled to more than one visit. The visit a rule is scheduled to influences the amount of data that has to be transported between visits. The inputs for the rule need to be transported to the visit of the rule, and the result of the rule to the visits where these results are used. The set of visits to which a rule |r| of a production of nonterminal |N| can be scheduled to are all visits of |N|, except the visits associated with end-visit nodes in the dependencies of |r|, and the visits associated with the begin-visit nodes in the users of |r|. Given these sets of visits for rules, we can apply scheduling strategies. For example, in Chapter~\tref{chapt.iterative}, we present notation to specify the iteration of visits. We try to avoid scheduling a rule to such visits, if possible. We may schedule rule |r| to any of these visits, however, by doing so, the scheduling may affect the set of possible visits for dependencies and users of |r|. As our default scheduling strategy, we schedule each rule to its last possible visit\footnote{ Each nonterminal has a terminator visit |epsilon|, which does not have any attributes, and for which no code is generated. Rules that are scheduled to that visit are discarded during code generation. }, and add correspondingly a scheduling dependency from the rule to the beginning of that visit to the graph. This approach has the advantage that the order in which we make decisions for rules does not influence the resulting graph, and also ensures that the graph remains acyclic. \paragraph{Step 2: order by partition.} Secondly, we determine per visit the order of its rules. The order of the rules may affect the amount of overhead in the clause selection. We need to consider the following items. \begin{itemize} \item Rules are preferably executed once per visit. As heuristic, we schedule rules that do not depend on any clause before the clause selection\footnote{ An alternative approach is to keep track of which rules that do not depend on clauses are already applied during the evaluation of a previous clause. However, the savings on overhead are likely small. For visits declared as total, such rules eventually need to be applied. }. \item Match-rules and invoke-rules with the partial-strategy may fail. Such rules are preferably scheduled early, because all computations that are performed for a clause that fails is overhead. For example, the match-rules that test the value of an inherited attribute, as introduced by the translation of |datasem|s, should be scheduled upfront. \item Rules may have expressions that are expensive to compute. An accurate estimation of such costs requires an analysis of host-language terms and is hard in general. \item Rules of a particular form may depend on other rules of a different form. We therefore cannot straightforwardly group all match-rules together. \end{itemize} To take these items into account, we use a customizable strategy. We associate with each rule a \emph{partition} and \emph{class} for a finite sequence of partitions and a finite sequence of classes. Rules of one partition are scheduled before rules of a later partition. Within a partition, we schedule rules iteratively. With each iteration, we schedule the rules of the earliest class that has rules that can be scheduled. With this approach, rules of an earlier class precede rules of a later class, if possible. We describe this approach in more detail below. We define an ordered sequence of partitions. Each rule must be associated with one partition, and a rule may not have a dependency that belongs to a later partition. For a given visit, we schedule the rules of one partition before those of the next partition. The default partitions are: \begin{code} data P = InCommon | InClause ^^^ deriving (Eq,Ord) \end{code} We associate with |InCommon| all rules of a visit that are not users of a begin-clause-node of that visit, and with |InClause| the remaining rules of that visit. \begin{figure}[ht] \begin{center} \begin{tabular}{lll} type of rule & classification & strategy \\ \hline match-rules & |Earliest| & quick clause selection \\ partial invoke-rules & |Earlier| & clause selection \\ pin-rules & |Middle| & early side effect, preferably no backtrack \\ total invoke-rules & |Later| & visit children when needed \\ eval-rules & |Latest| & minimize distance between computation and use \\ \end{tabular} \end{center} \caption{Association of rules with classes.} \label{fig.order.assoc} \end{figure} \paragraph{Step 3: order by classification.} For each partition, we define the order of the rules based on a classification of the rule. Each rule must be associated with a class, which represents a scheduling preference. By default, we distinguish the following classes: \begin{code} data C = Earliest | Earlier | Middle | Later | Latest ^^^ deriving (Eq,Ord) \end{code} Figure~\ref{fig.order.assoc} shows the default association based on a rule's type. To influence the ordering, a programmer can specify more classes, and explicitly specify such a class for some of the rules. \begin{figure}[p] \begin{code} -- gives each vertex of |verts| a unique rank based on |classes| rankVertices :: [C] -> [Node] -> R () rankVertices classes verts = do c <- foreach classes iter $ do vs <- unionM [ ps | v <- verts, hasClass v c, let ps = deps' verts v , allM (\n -> hasClass n c `impliesM` isRanked n) (ps \\ [v]) ] guard (notNull vs) iter $ do vss <- filterM notNull $ mapM (readyNodes verts vs) classes guard (notNull vss) v <- foreach $ sortAsc cmpNodePos $ head vss rank v -- returns the subset of |vs| that can be scheduled readyNodes :: [Node] -> [Node] -> C -> R [Node] readyNodes verts vs c = [ v | v <- vs, isNotRanked v, hasClass v c, allM isRanked (deps verts v) ] -- returns |v| and its indirect dependencies deps' :: [Node] -> Node -> [Node] deps' verts v = [v] `union` map (deps' verts) (deps verts v) \end{code} \begin{code} iter :: R () -> R () -- iterate param until a guard is |False| guard :: Bool -> R () -- check a condition; |fail| if |False| foreach :: [a] -> R a -- repeat cont. for each item in list rank :: Node -> R () -- gives the node the next rank deps :: [Node] -> Node -> [Node] -- direct dependencies hasClass :: Node -> C -> R Bool -- |True| iff the node has the given class isRanked :: Node -> R Bool -- |True| iff the node is ranked sortAsc :: (Node -> Node -> Ordering) -> [Node] -> R [Node] cmpNodePos :: Node -> Node -> Ordering \end{code} \caption{Order algorithm for the visit's rules.} \label{fig.visit.algorithm} \end{figure} The classes specify a scheduling preference: a rule of an earlier class is preferably scheduled before a rule of a higher class. However, this is not possible when a rule of an earlier class depends on a rule of a later class. Moreover, to schedule some rules of an earlier class, there may be different \emph{minimal} subsets of rules of a later class possible, so that it is not obvious which subset to take. In the following example, the first match-rule is scheduled first. None of the other match-rules can be scheduled, unless either the rule for |loc.a| or the rule for |loc.b| is scheduled before. To ensure a deterministic scheduling, we schedule both |loc.a| and |loc.b| rules, and use their order of appearance as final distinguishing factor: \begin{code} match loc.x = 3 -- class: |Earliest| loc.a = e -- class: |Latest| loc.b = loc.x -- class: |Latest| match loc.y = f loc.a -- class: |Earliest| match loc.z = g loc.b -- class: |Earliest| \end{code} However, if the rule for |loc.y| would additionally mention |loc.z| in its RHS, then the rule for |loc.a| is scheduled directly after the rule for |loc.x|, followed by the rule for |loc.z|, and only then the rule for |loc.b|. Figure~\ref{fig.visit.algorithm} shows the scheduling algorithm |rankVertices|. It uses the rank monad |R|, which is a combination of a list, state, and continuation monad with monad comprehensions~\citep{ariem-ext-ordered}, to describe algorithms with the enumeration combinator |foreach| and the iteration combinator |iter|. As parameters, it gets the sequence of classes in the order from earliest to latest, and the set of vertices of the partition to schedule. The result of the algorithm is an association with a unique rank for each of these vertices. With this algorithm, we try to schedule rules in increasing class order by giving the vertices of such rules an increasing rank. The class |c| is the class we currently schedule for. For class |c|, we determine the set |vs|, which is the smallest set that contains all vertices of rules of class |c| that do not depend on an unranked vertex of class |c| or an earlier class. Moreover, the set contains the indirect dependencies of these rules. This set may include vertices of a later class, or already ranked vertices. Thus, |vs| contains the largest set of vertices of class |c| that can be ranked when we only consider already ranked vertices of the same class or earlier. Also, it contains the smallest set of vertices of a later class that need to be ranked to allow the ranking of all the vertices of an earlier class in |vs|. For an acyclic graph, and as long as there are unscheduled rules of class |c|, the set |vs| has at least one rule. We ensure that each rule in |vs| is scheduled. Thus, by repeating this process we rank all rules of class |c|. Consequently, after processing all classes, all vertices in |verts| are ranked. We rank the vertices of |vs| in iterations. With |readyNodes| we determine for each class the set of unranked vertices that have ranked dependencies, and take the nonempty subset of the earliest class. Such a set exists, unless all vertices of |vs| are ranked. We rank these vertices in the order of the appearance of their rules. The algorithm guarantees that an unranked node is ranked if all its dependencies are ranked (soundness property), and that there is per iteration at least one vertex ranked as long as there are unranked vertices left (progress property). Assume the number of vertices to be ranked is $n$, and the number of classes is $c$. Both the worst-case asymptotic time and memory complexity of |deps'| is $O(n)$, where |abs vs <= n|. The functional implementation assumes that vertices are represented as an integer. The corresponding rule can be determined in constant time via a lookup table. For |readyNodes| the time and memory complexities are $O(n^2)$. The time complexity of the algorithm for |vss| is thus $O(c n^2)$, and of the algorithm for |vs| is $O(n^2)$. The memory complexity is $O(n^2)$. Despite the double iter-blocks, each line is at most repeated $n+c$ times, because with every repetition at least one vertex is ranked. The worst-case time complexity of |rankVertices| is thus $O(c n^3)$ and the worst-case memory complexity is $O(n^2)$. Note that in practice the values for $c$ and $n$ are small, and that most rules have only a small number of dependencies. \paragraph{Step 4: cleanup.} We schedule the rules in a partition in the order of their rank. For the purely functional eval-rules, we make an exception. Starting with the highest-ranked eval-rule, we shift it just before the lowest-ranked of its users in the dependency graph. Scheduling such rules later does not affect the overhead of clause selection, and our strategy is to bring such rules close to where their results are used. Since eval-rules have the |Latest|-classification, it is already likely that these rules are already at such a position. In a similar way, we could also move total invoke-rules. However, the visit invoked by total invoke-rules may contain pin-rules, thus for such rules we keep the scheduling via the classification mechanism. \paragraph{Remarks.} The implicit invokes and automatic ordering allow a straightforward transformation from a datasem-block to a sem-block. Essentially, a datasem-block is syntactic sugar for a number of clauses, each with a match-rule, and a number of child-rules. We also allow in RulerCore omitted visits and clauses to be implicitly defined. Combined with implicit invocations, this makes it easy to add additional visits to an interface. Furthermore, the automatic rule ordering allows us to write independent rules separately from each other (possibly in separate files) and use a preprocessing step to merge the rules together. The scheduling also offers opportunities to exploit parallelism. When the |head vss| consists of total invoke-rules, during |rankVertices|, these invocations are candidates to evaluate in parallel because their computations are independent. However, whether or not parallelism is beneficial depends on how expensive the computation of the visit is. With the classification mechanism, a rough static approximation can be expressed by the programmer. For example, we can introduce a class for total invocation rules that take priority over conventional invocation rules. When multiple of these invocation-rules are scheduled together, we can actually generate code that performs the visits in parallel. \section{Discussion} \label{side.eval} RulerCore can be used to express traversals over tree-like data structures. To a limited extend RulerCore may be applicable to graphs traversals that are technically tree traversals (such as a traversal over a depth-first forest). Loops and iteration can be expressed with higher-order attributes. In related work, we expressed these by iterating visits~\citep{Middelkoop10gpce}). However, RulerCore is not suitable to express traversals over drastically changing data structures. In our actual implementation, we also provide a notion of internal visits. A conventional visit is invoked externally by the parent, and can choose a clause. This means that we can only conditionally compute attributes once per visit. In contrast, an internal visit is invoked at the end of the clause, and is not visible externally. An internal visit may again have clauses, and these clauses may again specify an internal visit as next visit, or a conventional visit. With this relatively simple extension, we can arbitrarily often branch inside a visit. %if False Furthermore, we integrated demand-driven attribute evaluation. With demand-driven evaluation, certain circular grammars may still produce values for attributes: those that have no runtime circular dependencies. We either use statically ordered evaluation of a visit, or demand-driven one. A demand-driven visit may not use side effect, nor have multiple clauses. With visits, we can adequately model this combination. %endif In the Haskell version of RulerCore, we require type signatures for attributes. In {JavaScript}, instead of type signatures, the notion of a type signature represents a dynamic check in the form of assertion-functions that validate the values for attributes. To fully enjoy the benefits of attribute grammars, the host language requires support for purely functional data structures~\citep{Okasaki:1998:PFD:280586}. Such data structures can be encoded in {JavaScript}, but efficient versions with copy-on-write semantics are cumbersome to implement manually. %endif \section{Related Work} \label{side.relatedwork} Related to \thiswork{} are various visitor-like approaches and attribute grammar techniques. The purpose of the Visitor design pattern~\citep{DBLP:conf/ecoop/GammaHJV93} is to decouple traversal operations from the specification of the tree to be traversed, in order to make it easier to add new operations without changing the existing specification of the tree. This allows us to write a multi-visit traversal using a separate visitor per traversal. Multi-methods~\citep{DBLP:conf/oopsla/ChambersL94} are supposed to replace the visitor pattern. A multi-method allows overloading of methods on multiple parameters, and makes |accept|-methods superfluous. This, however, is orthogonal to the problems and solutions that we presented in \thiswork{}. In Section~\ref{side.example.vis}, we discussed advantages and disadvantages of modeling traversals with visitors. In particular, side effects are permitted, and used to store results for use in later visits. The side effects make it hard to predict if results needed in a next visit are actually stored by a first visit. This is a fundamental problem of visitors. \citet{DBLP:conf/oopsla/OliveiraWG08}, for example, show many enhancements with respect to the type safety of visitors, but do not address the transfer of results between visits. Attribute grammars~\citep{DBLP:journals/mst/Knuth68,DBLP:conf/waga/Knuth90} are considered to be a promising implementation for compiler construction. Several attribute grammar techniques are important for our work. \citet{DBLP:journals/acta/Kastens80} introduces ordered attribute grammars. In OAGs, the evaluation order of attribute computations as well as attribute lifetime can be determined statically, allowing severe optimizations. %if fullVersion \citet{DBLP:journals/toplas/Boyland96} introduces conditional attribute grammars. In such a grammar, semantic rules may be guarded. A rule may be evaluated if its guard is satisfied. Evaluation of guards may influence the evaluation order, which makes the evaluation less predictable. In comparison, in our clauses-in-visits model, we have to explicitly indicate in which visits guards are evaluated (the match-statements of a clause), which makes it clear what the evaluation order is. Our approach has the additional benefit that children may be conditionally introduced and visited. %endif Recently, many Attribute Grammar systems arose for mainstream languages, such as Silver~\citep{DBLP:journals/entcs/WykBGK08} and JastAdd~\citep{DBLP:conf/oopsla/EkmanH07} for Java, and UUAG~\citep{uuagc} for Haskell. In contrast to the work in \thiswork{}, these systems strictly discourage or disallow the use of side effects. The design of RulerBack is inspired by the language of execution plans of UUAG. In certain languages, AGs can be implemented via meta-programming facilities, which obliviates the need of a preprocessor. \citet{DBLP:conf/icfp/VieraSS09} show how to implement AGs in Haskell through type level programming. The ideas that we presented in \thiswork{} are orthogonal to such approaches, although the necessary dependency analysis may be difficult to express depending on the expressiveness of the meta language. %if False The work for \thiswork{} is carried out for a research project to support the implementation of the type inference component of the Utrecht Haskell compiler. In the workshop version of \thiswork{}~\citep{Middelkoop10vis}, we presented an earlier version of RulerCore's clauses-per-visit model to allow attribute grammars to implement functions that perform case distinction on more than a single AST. In a later paper~\citep{Middelkoop10gpce}, we improved on this model to allow iteration of visits, and dynamic growing of trees, to model fixpoint construction of proof trees. That work was carried out using Haskell as a host language. In \thiswork{}, we made an explicit connection between AGs and RulerCore, and use this connection to express side effects in AGs. %endif \section{Conclusion} \label{side.conclusion} We introduced the language RulerCore, an extension of Attribute Grammars that makes visits to nonterminals explicit. As a consequence, it is possible to use side effects in rules. RulerCore combines the freedom of visitors as described by the visitor design pattern with the convenience of programming with attributes, as shown in Section~\ref{side.example}. Moreover, we presented RulerBack, a subset of RulerCore, which serves as a small core language for visitor-based Attribute Grammars. In RulerBack, the lifetime of attributes is explicit, as well as the evaluation order of rules and visits to children. %if fullVersion We described how RulerCore programs are mapped to RulerBack in Section~\ref{side.trans.rulerfront}. %endif A RulerBack program has a straightforward translation to many languages. In Section~\ref{side.trans.rulercore}, we gave a translation to JavaScript by making use of exceptions in combination with the try-catch mechanism. %if thesis A more sophisticated translation is possible that does not require exceptions, as we show in later chapters. %endif \paragraph{Future work.} A direction of future work is to consider destructive updates on attributed trees. Event-handling traversals over data structures may need to respond to dynamic changes induced by user input or external events. In RulerFront, the visits performed on an attributed tree explicitly specify which attributes are defined. When we apply a destructive update to the tree, we thus know precisely what information is based upon the previous structure of the tree. This knowledge can be exploited to reason about mutations of the attributes tree. Incremental evaluation of attribute grammars~\citep{DBLP:conf/plilp/VogtSK91,DBLP:journals/sigplan/YehK88b} may be used to efficiently recompute attributes after modifications of the AST. \Thiswork{} treated the scheduling of rules in the presence of side effects. Side effects are not visible in value dependencies between attributes. In Chapter~\tref{chapt.dependent-ags} we show how to incorporate dependently-typed programming languages, which have type assumptions resulting from pattern matches as `effect'. %if showExtra \begin{subappendices} \section{The Ranking Monad} \label{section.side.support} %format forall = "\forall" %format return = "\Varid{return}" %format s0 Figure~\ref{fig.visit.algorithm} uses the ranking monad |R|, which we define in this section\footnote{ Code: @https://svn.science.uu.nl/repos/project.ruler.papers/archive/RankMonad.hs@ } as background material. These definitions require the @Control.Monad@ module. |R| is a continuation-based monad with failure. We define similar monads in Section~\rref{sect.outline.gadts}, Section~\rref{sect.outline.stepwise} and Chapter~\tref{chapt.iterative}. Internally, |R| is a wrapper around a function that threads a state |S|. We leave |S| unspecified. This function takes a continuation as parameter, which gets the result of the monadic computation and an updated state. Failure is expressed with a |Maybe| result value: \begin{code} data R a where R :: (forall b . ((a -> S -> (Maybe b, S)) -> S -> (Maybe b, S))) -> R a instance Functor R where fmap f (R h) = R (\c -> h (c . f)) instance Monad R where return x = R (\c -> c x) R g >>= f = R (\c -> g (\a -> case f a of R h -> h c )) fail _ = throwError () instance MonadState S R where get = R (\c s -> c s s) put s = R (\c _ -> c () s) instance MonadError () R where throwError () = R (\ _ s -> (Nothing, s)) catchError m h = R (\ c s -> let f c Nothing s = runR (h ()) s (g c) f c (Just x) s = c x s g c Nothing s = (Nothing, s) g c (Just x) s = c x s in runR m s (f c)) runR :: R a -> S -> (Maybe a -> S -> b) -> b runR (R f) s g = case f (\ x s -> (Just x, s)) s of (o, s') -> g o s' \end{code} The function |foreach| applies the continuation to each element of the list and threads the state through these computations. The result is the outcome of the last succeeding computation: \begin{code} foreach :: [a] -> R a foreach xs = R (\c s -> let f c (o, s0) x = case c x s0 of (Nothing, s') -> (o, s') (o', s') -> (o', s') in foldl (f c) (Nothing, s) xs) \end{code} The remaining API functions can be defined in terms of the above functions. The |iter| function runs the computation |m| that it takes as parameter until a guard of |m| fails: \begin{code} iter :: R () -> R () iter m = (m >> iter m) `orElse` () orElse :: R a -> a -> R a orElse m r = m `catchError` (\ _ -> return r) guard :: Bool -> R () guard g = when (not g) (throwError ()) \end{code} The function |foreachL| is used in the translation of the monadic list comprehensions, which are not to be confused with monad comprehensions: \begin{code} foreachL :: [R a] -> R [a] foreachL = foldM f [] where f acc m = do acc' <- singleR m `orElse` [] return (acc ++ acc') singleR :: R a -> R [a] singleR m = m >>= return . return \end{code} \end{subappendices} %endif