\documentclass{entcs} \usepackage{entcsmacro} %include lhs2TeX.fmt %include polycode.fmt %let fullVersion = False \usepackage{stmaryrd} \usepackage{float} \usepackage{graphicx} \usepackage{pgf} \usepackage{mathpartir} \usepackage{tikz} \usetikzlibrary{arrows,positioning} \floatstyle{boxed} \restylefloat{figure} \renewcommand{\hscodestyle}{\normalsize} \newenvironment{smallcode}[0]{\renewcommand{\hscodestyle}{\small}}{\renewcommand{\hscodestyle}{\normalsize}} \def\lastname{Middelkoop and Dijkstra and Swierstra} %format . = "." %format : = "\!:\!" %format ++ = "+\!+" %format ^^^ = "\;\;\;" %format ^^^^ = "\hspace{1em}" %format epsilon = "\epsilon" %format (sub (x) (y)) = "{" x "}_{" y "}" %format (sup (x) (y)) = "{" x "}^{" y "}" %format (many (x)) = "\overline{" x "}" %format (abs (x)) = "|" x "|" %format c1 %format c2 %format x1 %format x2 %format x3 %format v1 %format v2 %format Gamma = "\Gamma" %format Sigma = "\Sigma" %format Phi = "\Phi" %format :- = "\vdash" %format emptyset = "\emptyset" %format sep = "\:;\:" %format bar = "|" %format ::: = "\::\:" %format `union` = "\cup" %format union = "\cup" %format `intersection` = "\cap" %format intersection = "\cap" %format `subset` = "\subseteq" %format scope = "\circ" %format << = "\llbracket" %format >> = "\rrbracket" %format ~> = "\leadsto" %format (semc (i) (s) (z)) = << z >> "_{" i,s "}" %format (semt (i) (z)) = << z >> "_{" i "}" %format (semb (z)) = << z >> %format (semp (e) (z)) = << z >> "_{" e "}" %format Gamma0 %format Gamma1 %format Gamma2 %format Gamma3 %format Gamma4 %format jvisit = "visit" %% javascript keywords %format function = "\mathbf{function}" %format var = "\mathbf{var}" %format for = "\mathbf{for}" %format if = "\mathbf{if}" %format prototype = "\mathbf{prototype}" %format return = "\mathbf{return}" %format throw = "\mathbf{throw}" %format catch = "\mathbf{catch}" %format try = "\mathbf{try}" %format == = "==" %% AG keywords %format attr = "\mathbf{attr}" %format data = "\mathbf{data}" %format sem = "\mathbf{sem}" %format inh = "\mathbf{inh}" %format syn = "\mathbf{syn}" %format con = "\mathbf{con}" %format clause = "\mathbf{clause}" %format default = "\mathbf{default}" %format itf = "\mathbf{itf}" %format visit = "\mathbf{visit}" %format internal = "\mathbf{internal}" %format match = "\mathbf{match}" %format child = "\mathbf{child}" %format invoke = "\mathbf{invoke}" %format datasem = "\mathbf{datasem}" %format rulerfront = "{\mbox{\scshape ruler-front}}" %format rulercore = "{\mbox{\scshape ruler-core}}" %format Java = "{\mbox{\scshape Java}}" %format Haskell = "{\mbox{\scshape Haskell}}" %format Javascript = "{\mbox{\scshape JavaScript}}" %format MiniJava = "{\mbox{\scshape MiniJava}}" %format JS = "\mathcal{J}" \begin{document} \begin{frontmatter} %if fullVersion \title{Visitor-based Attribute Grammars with Side Effect (Extended Version)} %else \title{Visitor-based Attribute Grammars with Side Effect} %endif \author{Arie Middelkoop\thanksref{email}} \address{Universiteit Utrecht, The Netherlands} \author{Atze Dijkstra\thanksref{email}} \address{Universiteit Utrecht, The Netherlands} \author{S. Doaitse Swierstra\thanksref{email}} \address{Universiteit Utrecht, The Netherlands} \thanks[email]{Email: \href{mailto:ariem@@cs.uu.nl} {\texttt{\normalshape \{ariem,atze,doaitse\}@@cs.uu.nl}}} \begin{abstract} The visitor design pattern is often applied to program traversal algorithms over Abstract Syntax Trees (ASTs). It defines a visitor, an object with a visit method that is executed for each node in the AST. These visitors have the advantage that the order of traversal is explicitly under control of the programmer, which is essential to deal with side-effectful computations. Unfortunately, the exchange of results between traversals is error-prone. Attribute Grammars (AGs) are an alternative way to write multi-traversal algorithms. An attribute evaluator decorates the AST with attributes in one or more traversals. The attributes form a convenient mechanism to exchange results between traversals. Unfortunately, AGs discourage the use of side effect. In this paper, we present |rulerfront|, a language capturing the combination of the above approaches. A |rulerfront| grammar can be translated to traversal algorithms in multiple languages. In this paper, we translate to the imperative, dynamically-typed language |Javascript|. \end{abstract} \begin{keyword} attribute grammar, visitor, design pattern \end{keyword} \end{frontmatter} \section{Introduction} Algorithms for traversing tree-shaped data structures appear in many applications, especially in compilers. A lot of effort has been invested in proper abstractions for tree traversals, for example in the form of Attribute Grammars (AGs)~\cite{DBLP:journals/mst/Knuth68}. In the last years, we applied AGs in many small projects (to teach compiler construction~\cite{CCO}, master projects, etc.), and several large projects, including the Utrecht Haskell Compiler~\cite{1596650}, the Helium~\cite{Leijen:helium} compiler for learning Haskell, and the editor Proxima~\cite{Schrage04proxima}. The use of AGs in these projects is invaluable, for reasons that become clear in Section~\ref{sect.example}. Tree traversals play their role in many other fields, including end-user applications. Web applications, for example, traverse and compute properties of DOM trees. Sadly, the nice abstractions emerging from the compiler field are not used to write such traversals. One of these reasons is that AGs require an additional language to be learned. Also, the AG formalism poses too severe restrictions to be used effectively in these areas, such as prohibition of side effect, or tool support may simply be absent for the programming language in question. The purpose of this paper and associated work is to treat the above two technical challenges. Considering the first challenge, for imperative languages like |Javascript|, a programmer either writes recursive functions, or takes a more structured approach via the visitor design pattern~\cite{DBLP:conf/ecoop/GammaHJV93,674267,DBLP:conf/oopsla/OliveiraWG08}. Tool support for the visitor design pattern is available for many languages. For example, the parser generator SableCC~\cite{DBLP:conf/tools/GagnonH98} generates visitor skeleton code for the ASTs that the parser produces, and we used these once to write a type checker for |MiniJava|~\cite{DBLP:conf/sigcse/Roberts01}. We also used ASM~\cite{ASM}, a library used in many big Java projects that provides visitor skeleton code to traverse Java bytecode, to transactify Java programs~\cite{jtransact}. With visitors, we use side-effect to carry results computed in one visit over to the next. In our experience, scheduling visits and side effect is an error-prone process, due to absence of the define-before-use guarantee. We elaborate on this in Section~\ref{sect.example.vis}. Attribute grammars offer a programming model where each AST node has attributes (named values per node). The programmer writes code that computes attributes in terms of other attributes. The attribute grammar evaluator automatically schedules this code over visits, and define-before-use can be verified with the circularity test of AGs. The implicitness of scheduling is a serious advantage, because it saves us from writing this scheduling manually, and cannot do it wrong. Unfortunately, the implicitness of scheduling comes with a severe restriction: side effect cannot be used reliably and should not be used in attribute computations. In web applications, for example, we typically would like to use a bit of side effect to influence the contents of a webpage. We elaborate on this in Section~\ref{sect.example.ag}. The main contribution of this paper is an extension of attribute grammars that has an explicit notion of visits, which offers a hybrid model between visitors and attribute grammars, while maintaining the best of both worlds. In fact, besides being more expressive, our extension make attribute grammars more intuitive to use. %if fullVersion In the workshop version of this paper~\cite{Middelkoop10vis}, we presented an earlier version of a visitor-extension of attribute grammars to model more complex traversals. We improved this idea to deal with iterative traversals, and trees that change during evaluation. In this paper, we incorporate the notation of side effect, such that these ideas are not limited to ASTs of compilers, but tree traversals in a much wider range of applications. %endif \begin{figure}[htb] \makebox[\textwidth][r] { \begin{tikzpicture} [ nd/.style={rectangle, minimum size=5mm, thick, draw=black!50, top color=white, bottom color=black!20,font=\tiny} ] \node[nd, minimum width=25mm](a) {item a}; \node[nd, below left=0mm of a.south east](b) {very big item b}; \node[nd, below left=0mm of b.south east](c) {not so big c}; \node[nd, minimum width=15.8mm, below left=0mm of c.south east](d) {tiny}; \end{tikzpicture}} \vspace{-25mm} \begin{smallcode} \begin{code} function Menu(name, children) { -- constructor of a |Menu| AST node this.name = name; -- the name of the element to align this.children = children; -- an array of children menus } Menu.prototype.accept = function(visitor) { visitor.visitMenu(this); -- invokes the appropriate visit method } function Visitor() { -- constructor of a |Visitor| object this.depth = 0; -- the depth so far in the menu tree this.maximum = 0; -- the maximum width observed so far this.count = 0; -- the number of menus laid out so far } var root = -- the menu tree and corresponding html nodes new Menu("a", [ -- @
item a
@ new Menu("b", [ -- @
very big item b
@ new Menu("c", []) -- @
not so big c
@ , new Menu("d", []) -- @
tiny
@ ]) ]); -- @
@ function align(root, anchor) { -- aligns the html nodes according to the menu tree var v = new Visitor(); -- creates visitor with empty state v.visitMenu = function(menu) { -- first visit method (gets |menu| node as param) menu.elem = document.getElementById(menu.name); menu.depth = this.depth; -- remember |depth| for the second visit this.maximum = Math.max(this.maximum, this.depth * 20 + menu.elem.clientWidth); for(var i in menu.children) { this.depth = menu.depth + 1; -- reset |this.depth| to one deeper than current menu.children[i].accept(this); -- invokes visitor on children } } root.accept(v); -- invokes the first visit (on the root) v.visitMenu = function(menu) { -- second visit method (gets |menu| node as param) var offset = menu.depth * 20; menu.elem.style.left = (anchor.offsetLeft + offset) + "px"; menu.elem.style.top = (anchor.offsetTop + this.count * 30) + "px"; menu.elem.style.width = (this.maximum - offset) + "px"; menu.elem.style.height = 30 + "px"; this.count ++; -- inorder numbering of nodes for(var i in menu.children) { -- invokes |visitor| on |menu|s children menu.children[i].accept(this); -- |count| should not be reset in this case } } root.accept(v); -- invokes the second visit (on the root) } \end{code} \end{smallcode} \caption{Pseudocode dualvisit menu alignment.} \label{fig.intro.visitor} \end{figure} To accomplish this goal, we also address the second challenge, which is to make our approach available for many target languages. We present |rulerfront|, a small but powerful language for tree traversals. We managed to isolate the language-dependent part into a small subset called |rulercore|, and show the translation from |rulercore| to |Javascript|. In a related paper~\cite{Middelkoop10gpce}, we showed a translation to |Haskell|. With these two languages, we cover the implementation issues regarding the full spectrum of mainstream general purpose programming languages available today. Similar to Yacc, SableCC and UUAG~\cite{uuagc}, the idea is to embed code fragments of the target language for the computations of attributes. This keeps general-purpose programming constructs out of |rulercore|, and allows the programmer to express computations without having to learn a special language. In particular, |rulercore| is suitable as a target language for attribute grammars. In summary, we present two languages |rulerfront| and |rulercore|. We implemented both in a single tool written in Haskell using UUAG\footnote{Downloadable from svn: \url{https://subversion.cs.uu.nl/repos/project.ruler.systems/ruler-core/}}. In Section~\ref{sect.example} we investigate the above challenges in more detail. In Section~\ref{sect.sem.rulercore} we present |rulercore|, with a translation to |Javascript| in Section~\ref{sect.trans.rulercore}. %if fullVersion Finally, the translation from |rulerfront| to |rulercore| in Section~\ref{sect.trans.rulerfront}. %else In the extended version of this paper~\cite{wgt10journalext}, we give a translation from |rulerfront| to |rulercore|. %endif \section{Example} \label{sect.example} In this section, we motivate the claims of the introduction in more detail, and introduce the background information relevant for the remainder of the paper. We take as usecase the alignment of an HTML menu in a web application using |Javascript|, based on a multi-visit tree traversal over an abstract description of the menu. We first show a solution using the visitor-pattern, then a near-solution using attribute grammars, finally followed by two solutions using |rulerfront|. \subsection{Visitor design pattern.} \label{sect.example.vis} In the visitor design pattern, each node of the Abstract Syntax Tree (AST) is modelled as an object, which stores references to the subtrees, and has an |accept| method. The |accept| method takes a visitor as parameter. A visitor is an object with a |jvisit|-method for each type of node. The |accept| method of the AST node calls the appropriate |jvisit|-method on the visitor and passes the node as an argument. This |jvisit| method consists of statements that manipulate the state of the visitor or the AST node, and can visit a subtree by calling the |accept| method on the root of a subtree, with the visitor-object as parameter. Figure~\ref{fig.intro.visitor} shows an example of a visitor that layouts HTML items as a menu in a tree-like fashion, as visualized in the upper-right corner. The menus are aligned to the right, and submenus are slightly indented. Furthermore, we desire the smallest layout, based on the contents of the HTML items. The variable |root| contains an abstract description of the menu as a tree of |Menu| objects (the AST). Associated with each |Menu| object is an HTML item with the same name. We interpret the menu structure to layout the HTML items. In the first visit to the menu tree, we query the widths of the corresponding HTML items. In the second visit, we adjust the positions and sizes of these items. Some information (such as indentation based on the |depth|) is computed in the first visit, and also needed in the second visit. That information we store as additional fields in the menu objects. The order in which the tree is visited is clearly defined by the explicit |accept|-calls in the |jvisit|-methods. This is important to deal with side effect: we need to have queried all the sizes of the HTML items before we start resizing them. However, there are a number of issues with the above solution. In the second visit, we require a number of values computed in the first visit. We store these in the state of the AST nodes during the first visit. However, there is no guarantee that we actually stored them there in the first visit. Furthermore, we never remove any of these values from the state, and thus retain all memory until the AST gets deallocated. This especially becomes a problem when using large AST storing many results. Furthermore, we have to take care of the order of the statements. For example, the |this.depth| needs to be reset at the appropriate place, and requires that the assignment to |menu.depth| is done before. Similarly, the increment to |this.count| needs to be positioned carefully. These are actually separate aspects which we would like to implement in isolation, without having to worry about their composition. Finally, we need to explicitly write visits to children using |accept|. Some tools generate depth-first visitors, which alleviates the need to do so, but these come with restrictions. For example, all statements must be written before the invocations to children. In Figure~\ref{fig.intro.visitor} we reset |this.depth| in between visits to children. To use a depth-first visitor, we would have to move this statement (which may not be easy). Moreover, in the simple example that we showed, the two visits are invoked after each other at the root. In practice, for example in type checking languages with principal types, we actually invoke multiple visits on a subtree before moving on to the next subtree. This rules out depth-first visitors, and is also error-prone to write manually. \begin{figure}[htb] \begin{smallcode} \begin{code} data Root con Root root : Menu -- node with a child named |root| data Menu con Menu name ^^^ cs : Menus -- node with a property |name|, and a child |cs| type Menus : [Menu] -- conceptually a cons-list, physically an array var root = new Root_Root( -- the |Menus| are physically represented new Menu_Menu("a", [ -- as an array. However, conceptually new Menu_Menu("b", [ -- we define its attributes using the new Menu_Menu("c", []) -- above cons-list representation. , new Menu_Menu("d", []) ]) ])); attr Menu Menus inh depth finMax count -- |gathMax|: width of submenu syn gathMax count -- two attributes with the name |count| function align(root, anchor) { -- uses embedded attribute grammars datasem Root clause Root -- equations of production |Root| of nont |Root| root:depth = 0 -- initial |depth| root:count = 0 -- initial |count| root:finMax = root:gathMax -- choose gathered max as global max datasem Menu clause Menu -- production |Menu| of nonterm |Menu| cs:depth = 1 + lhs:depth -- increase |depth| for submenus cs:count = 1 + lhs:count -- increase |count| lhs:count = cs:count -- provide the updated count to the parent loc:elem = document.getElementById(loc:name) loc:offset = lhs:depth * 20 -- indentation loc:width = loc:offset + loc:elem.clientWidth lhs:gathMax = Math.max(cs:gathMax, loc:width) cs:finMax = lhs:finMax -- pass down final maximum loc:dummy = (function () { -- side-effectful statements loc:elem.style.left = (anchor.offsetLeft + loc:offset) + "px"; loc:elem.style.top = (anchor.offsetTop + lhs:count * 30) + "px"; loc:elem.style.width = (lhs:finMax - loc:offset) + "px"; loc:elem.style.height = 30 + "px"; }) () -- directly call the anonymous function datasem Menus -- equations of productions |Cons| and |Nil| clause Cons hd:depth = lhs:depth -- pass |depth| downwards through the menus tl:depth = lhs:depth hd:count = lhs:count -- thread the |count| through the menus, in an tl:count = hd:count -- in-order fashion. First to the head, then to lhs:count = tl:count -- the tail, then back up to the parent. lhs:gathMax = Math.max(hd:gathMax, tl:gathMax) hd:finMax = lhs:finMax -- pass global maximum downwards tl:finMax = lhs:finMax clause Nil lhs:count = lhs:count -- thread |count| through without changing it lhs:gathMax = 0 -- initial maximum var inhs = new Inh_Root(); -- contains inh attrs of the root eval_Root(sem_Root, root, inhs); -- run the attribute evaluator } \end{code} \end{smallcode} \caption{Attribute grammar-based near-solution to menu alignment.} \label{fig.intro.ag} \end{figure} The example in Figure~\ref{fig.intro.visitor} can easily be made more complicated, for example by having menus that share submenus, and form an acyclic graph instead of a tree. With each of such complications, the above mentioned problems grow worse. As a sidenote, in this paper, we treat the AST as a fixed datastructure. For example, we do not consider adding menu entries on the fly. The ideas we propose can deal with the dynamic construction of proof trees~\cite{Middelkoop10gpce}, and we think that this is sufficient to deal with dynamic changes to the AST as well, but leave this topic as future work. Below, we look for a way to generate code similar to the code above, but from a description that does not have the aforementioned problems. \subsection{Attribute grammars} \label{sect.example.ag} Attribute grammars take care of the problems mentioned above related to visitors, but are not flexible enough to take side effect into account. We briefly consider why attribute grammars appear a promising solution, and why side effect is a problem. Before we show the example, we first give some background information on attribute grammars, and their encoding in |Javascript|. As syntax, we take a mixture of UUAG's syntax~\cite{uuagc}, and |rulerfront| (which are closely related). An attribute grammar is an extension of a context-free grammar, where nonterminals are annotated with attributes, and productions specify equations between attributes. The context-free grammar specifies the structure of the AST: each node of the AST is associated with a production. A node is also associated to the nonterminal of the left-hand side of the production, and each child of a node to the corresponding nonterminal in the right-hand side of the production. For example, we can denote a production as well as the structure of a node in the AST using a data-type definition (explained below). \begin{code} data Menus -- nonterminal |Menus| con Cons hd : Menu tl : Menus -- production |Cons|, with two nonts con Nil -- production |Nil|, empty \end{code} This data-type declaration introduces a nonterminal |Menus| with two productions, representing a cons-list. The first production is named |Cons|, and corresponds in BNF to |Menus -> Menu Menus|. The two nonterminals |Menu| and |Menus| in the right-hand side (RHS) have explicitly been given the respective names |hd| and |tl|. Terminals only have a name (shown later in Figure~\ref{fig.intro.ag}). Furthermore, this data-type declaration introduces |Javascript| constructor functions to construct ASTs. Each production is mapped to a constructor function that gets as parameter an object corresponding to the symbols in the RHS of the production. Each nonterminal is mapped to a constructor function that creates a base object that each of the objects corresponding to the productions inherits. Due to the inheritance, we can verify at the point of construction that the AST matches the grammar. \begin{code} function Menus() {} -- nonterminal |Menus|: base class function Menus_Cons(hd, tl) { -- production |Cons|: subclass this.hd = hd; assert(hd instanceof Menu); this.tl = tl; assert(tl instanceof Menus); } Menus_Cons.prototype = new Menus(); Menus_Cons.prototype.constructor = Menus_Cons; function Menus_Nil() {} -- production |Nil|: subclass Menus_Nil.prototype = new Menus(); Menus_Nil.prototype.constructor = Menus_Nil; \end{code} Cons-lists occur often. As a shortcut, we alternatively write the following shorthand for the above instead. \begin{code} type Menus : [Menu] \end{code} As an additional bonus, we can represent a list of menus as a Javascript array. Evaluation of an attribute grammar runs an evaluation algorithm on each node, derived from the equations of its associated production, that decorates each node with attributes. We assume that attributes are physically represented as Javascript properties of the AST objects. Nodes are decorated with two types of attributes: inherited attributes are computed during evaluation of the parent of that node, and synthesized attributes are computed during evaluation of the node itself. We declare the attributes of a nonterminal using an attribute declaration. \begin{code} attr Menu inh depth -- inherited attribute syn gathMax -- synthesized attribute \end{code} These attribute names are mapped to object properties named |_inh_depth| and |_syn_gathMax|. At some point during attribute evaluation, given a participating |Menu| object |m|, the objects properties |m._inh_depth| and |m._syn_gathMax| will be defined. An inherited attribute may have the same name as a synthesized attribute: they are mapped to differently named properties. As an aside, nodes may define a number of local attributes, which can be seen as local variables. To give a semantics to these attributes, we specify equations (rules) per production (explained below - full details of the nonterminal and its semantics in Figure~\ref{fig.intro.ag}). \begin{code} datasem Menu -- nonterminal |Menu| clause Menu -- production |Menu| cs:depth = 1 + lhs:depth -- rule loc:width = 20 * lhs:width -- rule lhs:gathMax = Math.max(loc:width, cs:gathMax) -- rule \end{code} The left-hand side of an equation designates an inherited attribute, using the notation |childname:attrname|, which allows us to distinguish attribute names from properties. The names |loc| and |lhs| are special: |loc| indicates a local attribute, and |lhs| refers to a synthesized attribute of the current node. Thus, the attributes we need to define appear as left-hand side. For example, the above attribute designations are refer to the |Javascript| properties |this.cs._inh_depth|, |this._loc_width|, and |this._syn_gathMax| respectively. Similarly, the right-hand side consists of a |Javascript| expression, with embedded attribute references. In this case, we may refer to the synthesized attributes of children, or with |lhs| to the inherited attributes of the current node. The terminals of a production are available as local attributes. In production |Menu|, there is a terminal called |name|, which is available as attribute |loc:name|. The translation of attribute references is similar as described above. %%% FIGURE WAS HERE The last rule expands to the |Javascript| statement: \begin{smallcode} \begin{code} this._syn_gathMax = Math.max(this._loc_width, this.cs._syn_gathMax); \end{code} \end{smallcode} Evaluation of an attribute grammar corresponds to traversing the AST one or more times, and executing rules, according to an evaluation strategy. In this paper, we restrict ourselves to the class of well-defined attribute grammars, whose attribute dependencies can be statically proved to be acyclic~\cite{DBLP:journals/mst/Knuth68}. For those grammars, a traversal is possible that visits each subtree a bounded number of times. This corresponds precisely with typical uses of the visitor-design pattern. %% With on-demand evaluation of attributes, %% the application requests a value of a synthesized attribute of %% the root, and evaluator proceeds by first transitively %% computing all the attributes the rule depends on, potentially %% visiting subtrees multiple times. In this paper, we consider %% a greedy evaluation strategy: subtrees are visited a statically %% known bounded number of times. Out of the semantic definitions for e.g. |Menu|, a function |sem_Menu| is generated containing the evaluation algorithm. Furthermore, to interface with the decorated tree from |Javascript| code, a function |eval_Menu| is generated that takes the AST, the function |sem_Menu|, and an object containing values for the inherited attributes. It applies the semantic value, and returns an object with the synthesized attributes. \begin{code} var inhs = new Inh_Menu(); inhs.depth = 0; -- provide inh attrs of root syns = eval_Menu(sem_Menu, menu, inhs); -- initiate evaluation window.alert(syns.gathMax); -- access syn attrs of root \end{code} In Figure~\ref{fig.intro.ag}, we show an attribute grammar version of the example presented earlier. It is a non-solution, for reasons explained later, but exhibits various important properties. The keywords written in bold indicate a switch from |Javascript| code to AG code, and layout determines the switch back. The attribute grammar code starts with a number of data type definitions that describe the structure of the menu tree. We then define a number of attributes. In particular, the idea is that we gather a maximum |gathMax| (synthesized), and use its value at the root, to pass down the global maximum |finMax| (inherited). Moreover, we count the menus. The inherited attribute |count| specifies the count for the current menu, and the synthesized |count| is the count incremented with the total number of children. We define the semantics for these attributes in the function |align|. Because |root| and |anchor| are its parameters, we also have access to these in the right-hand sides of rules. To layout the HTML item, we need to execute a number of statements, and encode this as an expression. In |Javascript|, this can be accomplished in a variety of ways. In the example, we choose to use a parameterless anonymous function. In the semantic of |Menus|, rules are given to compute the attributes for lists of menus. These rules follow standard patterns: a topdown passing of |depth| and |finMax|, bottomup computation of gathMax, and an inorder threading of |count|. In the visitor-example, the fields in the visitor combined with side-effect took care of this behavior. With attribute grammars, we have to describe it explicitly. However, there are mechanisms to abstract from these patterns, in the form of copy rules~\cite{uuagc}, collection rules~\cite{10.1109/SCAM.2007.13}, or a generalization called default rules~\cite{Middelkoop10gpce}. With such abstractions, the semantics of |Menus| can be written in a much conciser way (as we see later). \begin{figure}[htbp] \begin{smallcode} \begin{code} data Root con Root root : Menu -- node with a child named |root| data Menu con Menu name cs : Menus -- node with a property |name|, and a child |cs| type Menus : [Menu] -- conceptually a cons-list, physically an array var root = new Root_Root( -- the |Menus| are physically represented new Menu_Menu("a", [ -- as an array. However, conceptually new Menu_Menu("b", [ -- we define its attributes using the new Menu_Menu("c", []) -- above cons-list representation. , new Menu_Menu("d", []) ]) ])); itf Root -- itf for nonterminal |Root| (root node) visit perform -- one visit, named |perform| inh ast -- menu-AST is inherited attribute itf Menu Menus -- itf for nonterminals Menus (menu nodes) visit gather -- first visit: compute maximum inh ast depth -- needs AST and depth syn gathMax -- computes maximum width of the menu visit layout -- second visit: layout the HTML items inh finMax count -- needs global maximum width syn count -- produces updated count function align(root, anchor) { -- uses embedded attribute grammars datasem Root clause Root -- equations of production |Root| of nonterm |Root| root:depth = 0 -- initial |depth| root:count = 0 -- initial |count| root:finMax = root:gathMax -- global max is the gathered max here invoke layout of root -- require that visit |layout| of |root| is invoked datasem Menu clause Menu -- equations scheduled to visits of |Menu| cs:depth = 1 + lhs:depth -- increase |depth| for submenus cs:count = 1 + lhs:count -- increase |count| lhs:count = cs:count -- provide the updated count to the parent match loc:elem = document.getElementById(loc:name) loc:offset = lhs:depth * 20 -- indentation loc:width = loc:offset + loc:elem.clientWidth lhs:gathMax = Math.max(cs:gathMax, loc:width) cs:finMax = lhs:finMax -- pass down final maximum visit layout -- equations for visit |layout| and later match _ = (function () { -- side-effectful statements (wrapped as function) loc:elem.style.left = (anchor.offsetLeft + loc:offset) + "px"; loc:elem.style.top = (anchor.offsetTop + lhs:count * 30) + "px"; loc:elem.style.width = (lhs:finMax - loc:offset) + "px"; loc:elem.style.height = 30 + "px"; }) () -- directly call the anonymous function datasem Menus -- standard patterns for |Menus| default depth = function(depths) { return depths[depths.length-1]; } default finMax = function(maxs) { return maxs[maxs.length-1]; } default gathMax = function(maxs) { return Math.max.apply(Math, maxs); } default count = function(counts) { return counts[0]; } clause Cons -- a clause must be given for each production, clause Nil -- otherwise easy to forget one var inhs = new Inh_Root_perform(); -- contains inh attrs for the root inhs.ast = root -- AST as inherited attribute eval_Root_perform(sem_Root, inhs); -- run the attribute evaluator } \end{code} \end{smallcode} \caption{|rulerfront| solution to menu alignment.} \label{fig.intro.ruler1} \end{figure} The AG code has several nice properties. The order of appearance of the rules is irrelevant. This allows the rules for e.g. |depth| and |count| to be written separately and merged automatically~\cite{uuagc}. In the example, we give all the rules in one go to fit the page, however, for bigger projects the ability to write such rules separately is important to write coherent code. Another nice property is the absence of invocations of visits (the |accept| calls in the visitor-example). The number of visits is totally implicit. From the dependencies between attributes in the rules, the attribute evaluator determines automatically that the attribute |root:gathMax| (in the semantics of |Root|) must be computed first in a visit, before it can be passed as |root:finMax|. Finally, we check statically if there is an evaluation order of statements such that all attributes are defined before their value is accessed. The attribute declarations describe the attributes that must be defined, and those that are available. The rules describe what attributes must be available before computing an attribute, and an evaluation order is possible if the transitive closure of the dependencies is non-cyclic~\cite{DBLP:journals/mst/Knuth68}. However, the above code has a number of problems, because the order of evaluation of rules is determined only by dependencies on attributes. In particular, the side-effect that rearranges the HTML items is not a dependency of any rule. Thus it is not clear when it is evaluated, if it is evaluated at all. Similarly, it is neither clear at what moment the widths of the HTML items are obtained. When there are other rules in play that have side effect that effects these widths, the interleaving of these side effects becomes even harder to predict. Finally, the root of the tree does not have any attributes defined, so there is actually no reason to expect any of the rules to be executed in the first place. %%% FIGURE WAS HERE \begin{figure}[htbp] \begin{smallcode} \begin{code} function align(root, anchor) { -- uses embedded attribute grammars var sem_Root = -- semantic function with itf |Root| sem ntRoot : Root -- equations for itf |Root| visit perform -- equations for the |perform|, the only visit clause Root -- production named |Root| child root : Menu = sem_Menu -- introduce a child |root| of nonterm |Menu| root:ast = lhs:ast -- use |lhs:ast| as AST root:depth = 0 -- initial |depth| root:count = 0 -- initial |count| root:finMax = root:gathMax -- global max is the gathered max of here invoke layout of root -- demand invocation |layout| of |root| var sem_Menu = -- semantic function with itf |Menu| sem ntMenu : Menu -- equations for itf |Menu| visit gather -- equations for first visit clause Menu -- production named |Menu| child cs : Menus = sem_Menus -- introduce a child |cs| of nonterm |Menus| cs.ast = lhs:ast.cs -- pass submenus as AST for |cs| cs:depth = 1 + lhs:depth -- increase |depth| for submenus match loc:elem = document.getElementById(loc.name) loc:offset = lhs:depth * 20 -- indentation loc:width = loc:offset + loc:elem.clientWidth lhs:gathMax = Math.max(cs:gathMax, loc:width) cs:finMax = lhs:finMax -- pass down global maximum visit layout -- equations for visit |layout| clause Menu' -- subproduction named |Menu'| cs:count = 1 + lhs:count -- increase |count| lhs:count = cs:count -- provide the updated count to the parent match _ = (function () { -- side-effectful statements loc:elem.style.left = (anchor.offsetLeft + loc:offset) + "px"; loc:elem.style.top = (anchor.offsetTop + lhs:count * 30) + "px"; loc:elem.style.width = (lhs:finMax - loc:offset) + "px"; loc:elem.style.height = 30 + "px"; }) () -- directly call the anonymous function var sem_Menus = -- semantic function, also itf |Menu| sem ntMenus : Menu -- equations for itf |Menu| visit gather -- equations for visit |gather| default depth = function(depths) { return depths[depths.length-1]; } default finMax = function(maxs) { return maxs[maxs.length-1]; } default gathMax = function(maxs) { return Math.max.apply(Math, maxs); } default count = function(counts) { return counts[0]; } clause Cons -- production |Cons| match true = lhs:ast.length >= 1 -- clause matches if array has an element child hd : Menu = sem_Menu -- introduce child |hd| using |sem_Menu| hd.ast = lhs:ast[0] -- head of the array child tl : Menu = sem_Menus -- introduce child |tl| using |sem_Menus| tl.ast = lhs:ast.slice(1) -- tail of the array clause Nil -- production |Nil| (matches always) var inhs = new Inh_Root_perform(); -- contains inh attrs for the root inhs.ast = root -- AST as inherited attribute eval_Root_perform(sem_Root, inhs); -- run the attribute evaluator } \end{code} \end{smallcode} \caption{Desugared |rulerfront| solution to menu alignment.} \label{fig.intro.ruler2} \end{figure} \subsection{Ruler-front} \label{sect.example.ruler1} We now present a solution using |rulerfront|. The syntax of |rulerfront| resembles the syntax of the AG in Figure~\ref{fig.intro.ag}, but is different. Before we jump into the example, we first discuss some of the differences. The central idea is to make visits to an AST node during attribute evaluation explicit. We then associate side effect with individual visits. \emph{Interfaces.} Instead of declaring attributes for a nonterminal, we declare an \emph{interface} for a nonterminal. An interface declaration specifies the visits of a nonterminal, and attributes per visit. In the following example, we specify that the attributes of |Menu| are computed in two visits. \begin{code} itf Menu -- interface for nonterminal |Menu| visit gather -- declaration of first visit inh ast -- inherited attr defined prior to visit syn gathMax -- synthesized attr computed by visit visit layout -- declaration second visit inh finMax count -- two inherited attributes syn count -- synthesized attr computed by visit \end{code} The order of appearance of visit declarations dictates the order of visits to AST nodes with this interface. In order to visit a node, all previous visits must have occurred: the actual visits on a node must be a prefix of the declared visits. Values for inherited attributes must be provided prior to the visit. Values for synthesized attributes are only available after a visit has been performed. In a conventional AG, the AST to traverse can be seen as hidden inherited attribute. In |rulerfront|, the AST must actually be provided explicitly as inherited attribute |ast| in the first visit. Section~\ref{sect.example.ruler2} motivates this choice. \emph{Scheduling.} The rules of a semantics-block are automatically scheduled over visits using an as-late-as-possible strategy. If the rules are cyclicly defined, the scheduling is not possible, and a static error is reported. Visits to children are automatically inferred based on the attribute requirements of rules. However, since |Root| has no attributes, there is no need to invoke any visits of |root|. Therefore, we specify through an |invoke| rule that visit |layout| must be invoked, which requires through attribute dependencies that also visit |gather| must be invoked, and kickstarts the evaluation. \emph{Scheduling constraints.} Rules can be constrained to visits. With a visit-block, we constrain rules to that visit, or a later visit. The example below illustrates the various possibilities. An attribute definition prefixed with the keyword |match| is an exception. It is constrained to the visit it appears in, and is executed even if the attribute it defines is never needed. We explain its precise meaning later. \begin{code} datasem Menu -- rules for nonterminal |Menu| clause Menu -- rules for production |Menu| cs:count = lhs:count + 1 -- scheduled in visit |gather| or later match loc:elem = ... -- precisely in visit |gather| visit layout -- rules for visit |layout| or later match _ = ... -- precisely in visit |layout| lhs:count = cs:count -- constrained to layout or later \end{code} With an underscore, we bind the value of the RHS of a rule to an anonymous attribute that we cannot refer to. A visit-block also introduces a subscope. A local attribute defined in a visit-block is not available for a rule defined in a higher scope, even if that rule is scheduled to a subscope. After all these preparations, we can finally present the |rulercore| solution in Figure~\ref{fig.intro.ruler1}. In this example, we express that the side effect that queries the widths of the HTML items, is constrained to the first visit, and the side effect that changes the location and dimensions is constrained to the second. For the |Menus|-nonterminal, we give default-rules for equality named attributes in its productions. If such an attribute does not have an explicit definition, these are implicitly defined by the default rule. The idea is that the default-rule provides a function that gets an area with all attribute values of the same name of previously visited children (or |lhs|). Formally, given a default-rule for attribute |a|, suppose that a child |(sub k i) `elem` (sub k 1),...,(sub k n),lhs| has an attribute |k.a| (synthesized if |k = lhs|, inherited otherwise), but lacks an explicit definition for it. The default-rule gives an implicit definition, by invoking the RHS of the default-rule with an array defined as follows. For each child |(sub c j) `elem` (sub k (i-1)),...,(sub k 1),lhs| that has an attribute |a| (inherited if |k = lhs|, synthesized otherwise), in this order, the array has a value |(sub c j).a|. In particular, the first entry is the value of the closest child, and the last entry is that of |lhs| (if such attributes exist). In the above example, we combined both side effect and attribute evaluation. We retain the advantages that AGs offer, such as the ease of adding attributes. Furthermore, the extension is orthogonal to various optimizations for attribute grammars, including incremental evaluation and multi-core parallel evaluation. However, we require the programmer to manually assign attributes to visits, and constrain side-effectful rules to particular visits, which is not necessary for conventional attribute grammars. In practice, this is only a minimal amount of extra work that has as additional advantages that it makes attribute evaluation more predictable and thus easier to understand. %% as well as simplifies cycle checks, both in terms of computational complexity %% and understandability of error messages. %%% FIGURE WAS HERE \subsection{Desugared Ruler-Front} \label{sect.example.ruler2} In Figure~\ref{fig.intro.ruler2} (explained below), we give another way to write the same example in |rulerfront|. Both Figure~\ref{fig.intro.ruler2} and Figure~\ref{fig.intro.ruler1} are valid |rulerfront| programs. The former is, however, a desugared version of the latter. This desugared version only uses a subset of |rulerfront| that we call |rulercore|. It naturally generalizes over Higher-Order~\cite{DBLP:conf/pldi/VogtSK89} and Conditional~\cite{DBLP:journals/toplas/Boyland96} Attribute Grammars. We use this example as preparation for |rulercore| in the next section. To save space, we omitted the data-type declarations, interface declaration, and |root| variable, which are equal to those in the first half of Figure~\ref{fig.intro.ruler1}. We present sem-blocks of the form |sem nonterm : Interface|, which introduces a nonterminal |nonterm|, with visits and attributes described by |Interface|. The productions are not defined by a data-type definition, but through clauses and rules per visit, as we explain below. Additionally, the code generated from a semantics-block is a constructor-function that produces an AST node described by |Interface|, which we can store in a variable, and may use in rules. In Figure~\ref{fig.intro.ruler2}, we start with a definition of the semantics for the root. The interface |Root| declares one visit. We generalize over productions for a nonterminal by having clauses for each visit. Each clause provides an alternative way to compute the attribute values. We thus give clauses for the visit |perform|, in this case only one clause. Clauses and visits may contain rules. Rules given for a visit are in scope of all clauses declared for that visit. Rules for a clause are only visible in that clause. We also see another type of rule, called a child-rule, which introduces a child. For example, we introduce a child |root|, with interface |Menu|, and the semantics defined by the |Javascript| value |sem_Menu|. The left-hand sides of an evaluation-rule may be a pattern. This is either an attribute reference, an underscore or a constant. Evaluation of such a rule fails when its execution throws an exception, or the left-hand side is a value that is not equal to the value computed for the right-hand side. During attribute evaluation, the clauses of a visit are tried at runtime in the order of appearance. The next clause is tried when either a match-rule fails, or when there is no succeeding clause for a visit to a child. Failure of any other form of rule simply aborts the entire evaluation. This way, the match-rules allow us to distinguish clauses |Cons| and |Nil| of |ntMenus| by matching on the length of the list. Missing visits are implicitly defined with a single empty clause. A visit without clauses implicitly has a single clause. Therefore, we neither have to specify the visit |layout| nor clauses for it in the semantics of |ntMenus|. Also, due to the automatic ordering of rules, many of the rules defined in visit |layout| of |ntMenu|, could also be defined one level higher, in visit |gather|. Note that this representation is more general than conventional attribute grammars, and that an attribute grammar can easily be mapped to this representation, as shown by the difference between Figure~\ref{fig.intro.ruler1} and Figure~\ref{fig.intro.ruler2}. \begin{figure}[htb] \begin{smallcode} \begin{code} e ::= JS [ many b ] -- embedded |rulerfront| blocks |b| in |JS| b ::= i | s | o -- |rulerfront| blocks i ::= itf I (many v) -- interface decl, with visits |v| v ::= visit x inh (many x1) syn (many x2) -- visit decl, with atributes |x1| and |x2| s ::= sem x : I t -- semantics expr, defines nonterm |x| t ::= visit x (many r) (many c) -- visit def, with common rules |r| | epsilon -- no visit (serves as terminator) c ::= clause x (many r) t -- clause definition, with next visit |t| r ::= p = e -- assert-rule, evaluates |e|, bind to pattern |p| | match p = e -- match-rule, backtracking variant | invoke x of c -- invoke-rule, invokes visit |x| on |c| | child c : I = e -- child-rule, introduces a child |c| of itf |I| o ::= c:x -- attribute reference in some embedded code p ::= c:x -- attribute reference in pattern | _ -- underscore | k -- constant x, c I, p, e -- identifiers, child identifiers, patterns, expressions respectively Gamma,Sigma ::= epsilon -- attr+child environment (used in semantics) | Gamma, scope -- new scope | Gamma, inh c:x -- inh attr |c:x| | Gamma, syn c:x -- syn attr |c:x| | Gamma, c:I (many v) -- child |c| with available visits |many v| Phi ::= epsilon -- interface environment (used in semantics) | Phi, I (many v) -- itf |I| with visit decls |v| \end{code} \end{smallcode} \caption{Syntax of |rulercore|} \label{fig.syntax.rulercore} \end{figure} \section{Static Semantics of |rulercore|} \label{sect.sem.rulercore} In this section, we introduce |rulercore|, a small subset of |rulerfront|, but sufficiently rich to serve as intermediate language for |rulerfront|. Figure~\ref{fig.syntax.rulercore} lists the syntax of |rulercore|. A |rulerfront| program |e| is a |JavaScript| program |J|, with embedded |rulercore| blocks |b|. A block |b| is either an interface declaration, semantics-block, or attribute reference. We explain the individual forms of syntax in more detail below. There are some essential differences in contrast to |rulerfront| that we gradually introduced by example in the previous section. The order of appearance of rules the evaluation order, and each invocation of a visit must explicitly be stated through an invoke rule. Special syntax for data-types is not part of |rulercore|. Through clauses and (match) rules, we have a general mechanism to inspect and perform case distinction on arbitrary |Javascript| datastructures. We make no assumptions about the syntax of |J|. The embedded blocks may occur anywhere in a |Javascript| program. It is up to the programmer to ensure that semantic-blocks and attribute references occur at expression-positions, and that interface-declarations occur at statement positions. Neither do we make any assumptions about scopes of |J|; instead, we assume that all embedded blocks are in the same scope. %format SumNil2 %format SumCons2 \begin{figure}[htp] \begin{smallcode} \begin{code} itf S visit v1 inh l syn emptyset -- decompose array |l| down visit v2 inh emptyset syn s -- compute sum |s| up var sumArr = sem sum : S visit v1 emptyset -- first visit clause sumNil -- when list is empty match 0 = lhs:l.length -- match empty |l| visit v2 emptyset -- second visit clause sumNil2 -- single clause lhs:s = 0 -- empty list, zero sum () -- no next visit clause sumCons -- when list non-empty loc:x = lhs:l[0] -- head of the list loc:xs = lhs:l.slice(1) -- tail of the list child tl : S = sumArr -- recursive call tl:l = loc:xs -- |l| param of call invoke v1 of tl -- invoke on child visit v2 emptyset -- second visit clause sumCons2 -- single clause invoke v2 of tl -- invoke on child lhs:s = loc:x + tl:s -- sum of head and tail () -- no next visit \end{code} \end{smallcode} \caption{Example of |rulercore| syntax: summing an array of integers.} \label{fig.syntax.example} \end{figure} Figure~\ref{fig.syntax.example} shows an example |rulercore| program to sum an array of integers in two visits. The first visit has two clauses: a clause |sumNil| when the array is empty, and |sumCons| when there is at least one element. In the second visit, we compute the actual sum, depending on the clause chosen in the first visit. A semantics-block introduces a visitor-object with an interface |I|. The interface dictates what visits can be made to the object, and what the inputs (inherited attributes) and outputs are (synthesized attributes). The outputs for a visit are produced by executing rules. We write these rules down in a tree of clauses and visits, as illustrated by the indentation in Figure~\ref{fig.syntax.example} and the state diagram: \begin{tikzpicture} [ grow=right , visit/.style={circle, minimum size=1mm, node distance=2mm, very thick, draw=black, fill=black,font=\scriptsize} , choice/.style={circle, minimum size=1mm, node distance=2mm, very thick, draw=black, fill=white,font=\scriptsize} ] \node[visit,label=90:|v1|]{} child { node[choice, label={[xshift=-4mm]120:|emptyset|}, label={[xshift=-4mm]60:|SumNil|}, label={[xshift=-6mm]-60:|SumCons|}]{} [level distance=25mm] child { node[visit,label=90:|v2|]{} [node distance=20mm] child { node[choice, label={[xshift=-7mm]120:|emptyset|}, label={[xshift=1mm]60:|SumCons2|}]{} child { node[visit,label=90:|()|]{} } } } child { node[visit,label=90:|v2|]{} child { node[choice, label={[xshift=-7mm]120:|emptyset|}, label={[xshift=1mm]60:|SumNil2|}]{} child { node[visit,label=90:|()|]{} } } } }; \end{tikzpicture} The black nodes represent the state of the AST-node prior to a visit, and the white nodes indicate a branch point. Upon creation, an AST node is in the state represented by the root node. With each edge are alternately associated the rules of a visit or of a clause. With each visit, an AST node tries to switch state to a next black node by executing the rules on the path to such a node. Execution of all of the rules must succeed. At a branch-point, rules on edges of clauses are tried in order of appearance. Results produced by executing rules are in scope of rules further along the path. \begin{figure}[htp] \begin{smallcode} %if fullVersion \begin{code} Gamma :- s ^^^ ^^^ ^^^ ^^^ (many v) sep Gamma :- t ^^^ ^^^ ^^^ ^^^ Sigma sep Gamma0 :- r ::: Gamma1 ^^^ ^^^ ^^^ ^^^ :- i ^^^ :- v Gamma :- o ^^^ ^^^ ^^^ ^^^ (many v) sep (many x) sep Gamma :- c ^^^ ^^^ ^^^ ^^^ Sigma sep Gamma0 :- p ::: Gamma1 ^^^ ^^^ ^^^ ^^^ Gamma :- e \end{code} %endif \begin{mathpar} %if fullVersion \inferrule*[right=itf] { |I unique| \\\\ |I (many v) `elem` Phi| \\ } {|:- itf I (many v)|} \inferrule*[right=attr] { |inh (sub a i) unique| \\\\ |syn (sub b j) unique| \\ } {|:- visit x inh (many a) syn (many b)|} \inferrule*[right=sem] { |x unique| \\\\ |I (many v) `elem` Phi| \\ |(many v) sep Gamma, scope :- t| \\ } {|Gamma :- sem x : I t|} %endif \inferrule*[right=end] {} {|[] sep Gamma :- epsilon|} \inferrule*[right=visit] { |Gamma0 `union` avail (visit x (many r) (many c)) sep Gamma0 :- (many r) ::: Gamma1| \\ |(many v) sep (many s) sep Gamma1 `union` { inh lhs:a bar a `elem` (many i) } :- (sub c i)| \\ } {|visit x inh (many i) syn (many s), (many v) sep Gamma0 :- visit x (many r) (many c)|} \inferrule*[right=clause] { |x unique| \\ |Gamma0 `union` avail (clause x (many r) (many c)) sep Gamma0 :- (many r) ::: Gamma1| \\ |(many v) sep Gamma1 :- t| \\ |{ (syn lhs:a) bar a `elem` (many s)} `subset` Gamma1| \\ } {|(many v) sep (many s) sep Gamma0 :- clause x (many r) t|} \inferrule*[right=assert] { |Sigma sep Gamma0 :- p ::: Gamma1| \\ |Gamma0 :- e| \\ } { |Sigma sep Gamma0 :- p = e ::: Gamma1| } \inferrule*[right=match] { |Sigma sep Gamma0 :- p ::: Gamma1| \\ |Gamma0 :- e| \\ } { |Sigma sep Gamma0 :- p = e ::: Gamma1| } \inferrule*[right=invoke] { |Phi((sub I c)) = (many v)| \\ |visit x inh (many i) syn (many s) `elem` (many v)| \\ |c : (sub I c) (many w) `elem` Gamma0| \\ %% check child |next (many w) (many v) = x| \\ %% check invoke |{ inh c:a bar a `elem` (many i) } `subset` Gamma0| \\ %% check inhs |Gamma1 = Gamma0 `union` { syn c:a bar a `elem` (many s) } `union` { c : (sub I c) ((many w), visit x inh (many i) syn (many s)) } | \\ %% add syn + visit } { |Sigma sep Gamma0 :- invoke x of c ::: Gamma1| } \inferrule*[right=child] { |Gamma0 :- e| \\ |Gamma1 = Gamma0 `union` { c : I emptyset }| } { |Sigma sep Gamma0 :- child c : I = e ::: Gamma1| } \inferrule*[right=occ.lhs] { |inh lhs:a `elem` Gamma| } { |Gamma :- lhs : a | } \inferrule*[right=occ.child] { |syn c:a `elem` Gamma| } { |Gamma :- c : a| } \inferrule*[right=pat.lhs] { |syn lhs:a `elem` Sigma| \\ } { |Sigma sep Gamma0 :- lhs : a ::: Gamma0, syn lhs:a| } \inferrule*[right=pat.loc] {} { |Sigma sep Gamma0 :- loc : a ::: Gamma0, syn loc:a| } \inferrule*[right=pat.child] { |inh c:a `elem` Sigma| \\ } { |Sigma sep Gamma0 :- c : a ::: Gamma0, inh c:a| } \inferrule*[right=const] {} { |Sigma sep Gamma :- k ::: Gamma| } \inferrule*[right=any] {} { |Sigma sep Gamma :- _ ::: Gamma| } \end{mathpar} \begin{code} avail (visit x (many r) (many c)) = (sub avail union) ((many r)) `union` (sub avail intersection) ((many c)) `union` { syn lhs:b | visit x inh (many a) syn (many b) `elem` Phi((sub I x)) } avail (clause x (many r) t) = (sub avail union) ((many r)) `union` avail (t) avail (p = e) = emptyset avail (match p = e) = emptyset avail (invoke x of c) = { inh c:a | a `elem` (many a), visit x inh (many a) syn (many b) `elem` Phi((sub I c)) } avail (child c : I = e) = { c : I (Phi I) } \end{code} \end{smallcode} \caption{Static semantics of |rulercore|} \label{fig.sem.rulercore} \end{figure} There are four types of rules. \begin{itemize} \item \begin{code} match p = e -- match-rule match loc:x = 3 -- example that succeeds match true = false -- example that fails \end{code} The pattern |p| must match the value of the right hand side. If the evaluation of |e| results in an exception, or the match fails, a backtrack is made to the next clause. If |p| represents an attribute, the attribute gets defined. \item \begin{code} p = e -- assert-rule (not prefixed with a keyword) \end{code} Similar to the above, except that the match is expected to succeed. If not, the evaluation itself aborts with an exception. \item \begin{code} child c : I = e -- child-rule child root : Menu = ntMenu -- example that introduces a |Menu| child \end{code} Evaluation of the rule above creates a child |c|, visitable according to the interface |I|, and created by executing the constructor function |e|. \item \begin{code} invoke x of c -- invoke rule \end{code} Executes visit |x| of child |c|. The inherited attributes of |x| must be defined, and all prior visits to |c| must have been performed. The invocation may fail if no clause matches. In that case, it causes the current AST node to backtrack to the next clause. If successful, the synthesized attributes of |x| become available. \end{itemize} Figure~\ref{fig.sem.rulercore} shows a static semantics for |rulercore|. A |rulercore| program that satisfies these conditions never crashes due to an undefined attribute, invalid rule order, or forgotten invocation to a child. Dynamic or static type checking we leave as responsibility of the host language. We briefly consider some aspect of these rules. Two environments play an important role: |Gamma| represents the children and attributes defined so far (to test for missing and duplicated definitions), and |Sigma| the attributes that are allowed to be defined (to test for definitions of unknown attributes). As additional constraint on environments, we consider it a static error when there is a duplicate attribute in the environment within two scope markers. Visits must be specified in the proper order, and none may be omitted. The relation for visits |t| gets a sequence of pending visits |many v| as declared in the interface. In rule \TirName{visit}, we verify that the name of the visit matches the expected visit in the head of |many v|. The next visit must match the head of the tail of this list, until in the end |many v| is empty. We also add the inherited attributes of the visits to the environment. The function |avail| defines which attributes may be defined. Higher-up in the visit-clauses-tree, we may only define those attributes that are common to all lower clauses. In rules \TirName{pat.lhs} and \TirName{pat.child}, we verify that we are indeed defining an attribute belonging to a certain child. In rule \TirName{invoke}, we verify that |x| is indeed the next visit in the expected sequence of visits |many v|, given the previous invocations |many w|. We furthermore verify that the inherited attributes for the visit of |c| are defined, and add the synthesized attributes to the environment. %% Mention that the rules are a bit crafted to keep them consice, and that this is much nicer %% with the AG version. \begin{figure}[htp] \begin{smallcode} \begin{code} var sumArr = function() { -- semantic function function nt_sum(_inps) { -- visit |v1| var lhsIl = _inps.l; -- extract |lhs:l| try { -- try clause |sumNil| if (lhsIl.length != 0) throw eEval; -- if |lhs:l| is empty var _res = new Object(); -- produce results of |v1| _res._next = function(_inps) { -- cont. for visit |v2| var lhsSs = 0; -- |lhs:s| rule var _res = new Object(); -- produce results of |v2| _res._next = null; -- no next visit _res.s = lhsSs; -- store |lhs:s| return _res; -- return result of |v2| }; return _res; -- return result of |v1| } catch(err) { -- try clause |sumCons| var locLx = lhsIl[0]; -- |loc:x| rule var locLxs = lhsIl.slice(1); -- |loc:xs| rule var vis_tl = sumArr(); -- creation of child |tl| tlIl = locLxs; -- |tl:l| rule var _args = new Object(); -- inputs for |v1| of |tl| _args.l = tlIl; -- store |tl:l| var _res = vis_tl(_args); -- invoke |v1| of |tl| var vis_tl = _res._next; -- extract results var _res = new Object(); -- produce results of |v1| _res._next = function(_inps) { -- cont. for visit |v2| var _args = new Object(); -- inputs for |v2| of |tl| var _res = vis_tl(_args); -- invoke |v2| of |tl| var tlSs = _res.s; -- extract |tl:s| result var lhsSs = locLx + tlSs; -- compute |lhs:s| var _res = new Object(); -- produce results of |v1| _res._next = null; -- no next visit _res.s = lhsSs; -- store |lhs:s| return _res; -- return result of |v2| }; return _res; -- return result of |v1| } }; return nt_sum; }; -- return visitor function \end{code} \end{smallcode} \caption{Example translation} \label{fig.sem.example} \end{figure} \section{Translation of |rulercore| to |Javascript|} \label{sect.trans.rulercore} In this section, we describe how to translate |rulercore| programs to |Javascript|. We translate each semantics-block to a coroutine, implemented as one-shot continuations. Each call to the coroutine corresponds with a visit. The parameters of the coroutine are the inherited attributes of the visit. The result of the call is an object containing values for the synthesized attributes, and the continuation to call for the visit. \begin{figure}[htp] \begin{smallcode} \begin{code} (semb (sem x : I t)) ~> function() { var (semb (nt x)) = (semt I t); return (semb (nt x)); } (semb (c:x)) ~> (semb (inp c x)) (semt I (())) ~> null (semt I (visit x (many r) (many c))) ~> function(_inps) { (semb (inp lhs (inhs I x))) = _inps.(semb (inhs I x)); (semb (many r)); (semc I (syns I x) (many c)); } (semc I (many s) []) ~> throw eEval; (semc I (many s) (c : (many c))) ~> try { (semc I (many s) c); } catch(err) { if (err == eEval) { (semc I (many s) (many c)); } else throw err; } (semc I (many s) (clause x (many r) t)) ~> (semb (many r)); var _outs = new Object(); _outs._next = (semt I t); _outs.(many s) = (semb (out lhs (many s))); return _outs; (semb (p = e)) ~> var _res; try { _res = (semb e); } catch (err) { if (err == eEval) throw eAbort; else throw err; } (semp eAbort p) (semb (match p = e)) ~> var _res = (semb e); (semp eEval p); (semb (child c : I = e)) ~> var (semb (vis c)) = ((semb e))(); (semb (invoke x of c)) ~> var _args = new Object(); _args.(semb (inhs (sub I c) x)) = (semb (out c (inhs (sub I c) x))); var _res = (semb (vis c))(_args); var (semb (inp c (syns (sub I c) x))) = _res.(semb (syns (sub I c) x)); var (semb (vis c)) = _res._next; (semp e (c:a)) ~> var (semb (out c a)) = _res; (semp e _) ~> ; (semp e k) ~> if (_res != k) throw e; \end{code} \begin{code} out "loc" x = "locL" x ^^^^ inp "loc" x = "locI" x out "lhs" x = "lhsS" x ^^^^ inp "lhs" x = "lhsI" x out c x = c "I" x ^^^^ inp c x = c "S" x vis c = "vis_" c ^^^^ nt x = "nt_" x syns I x ^^^^ ^^^^ inhs I x -- respectively, inh and syn attrs of |x| of |I| \end{code} \end{smallcode} \caption{Denotational semantics of |rulercore|} \label{fig.sem.denot} \end{figure} As an example, we show in Figure~\ref{fig.sem.example} the translation of the example in the previous section. To deal with backtracking, we use the exception mechanism, and throw an exception to switch to the next clause. Note that this does not rollback any side effect that the partial execution of the rules may have caused. To be able to do so, we can run the rules in a software transaction~\cite{Heidegger10}, for which many programming languages have tool support nowadays. Alternatively, when the side effect matters, the programmer can schedule it to an earlier or later visit, such that it is not influenced by backtracking. To deal with continuations, we use closures. The function to be used for the next visit, we build in the previous visit. This function has access to all the results computed in the previous visit. Furthermore, we store values for attributes in local variables. Those values that are not needed anymore, are automatically cleaned up by the garbage collector. Figure~\ref{fig.sem.denot} shows the general tranlation scheme, and naming scheme for attributes. In particular, for each visit, we generate a closure that takes values for inherited attributes as parameter. Clauses are dealt with through exception handling. When a clause successfully executed all statements, it returns an object containing values for synthesized attributes, as well as the continuation function for the next visit. The above translation is relatively straightforward. In practice, the selection of a clause is functionally dependent on the value of an inherited attribute, or a local attribute computed in a previous visit. In those cases, the selection of clauses can be implemented more efficiently using conventional branching mechanisms. We verified that the above implementation runs in time linear to the size of the tree, when we use version of the |slice| operation that does not make a copy of the array. With a throughput of about hundred array elements per microsecond, and about a thousand per microsecond with the exception handling replaced by conventional branching, this is still about one or two orders of magnitude slower than using a hand-written loop. In our experience, however, performance is rarely an issue. In general, the asymptotic complexity of the traversal is linear in the size of the tree, and the actual time taken by traversing the trees is insignificant compared to the work performed by the right-hand sides of the rules in a real application. %if fullVersion \section{Translation of |rulerfront| to |rulercore|} \label{sect.trans.rulerfront} In Section~\ref{sect.example.ruler2}, we showed by example how a |rulerfront| program can be encoded using only syntax in |rulercore|. For space reasons, we omit the data-type driven translation from a |datasem| into a |sem|, nor the translation of default-rules. Instead, in this section we assume that |rulerfront| consists of those programs that after insertion of invoke-rules and reordering of rules are a valid |rulercore| program. \subsection{Implicit Invocations} In |rulerfront|, invoke-rules may be omitted. From a |rulerfront| program, we derive a number of implicit invocatations. We first determine the needed attributes. From these we determine the maximum needed visit, and thus the sequence of visits needed. An invoke-rule needs to be inserted if there is no invoke-rule for any of these visits yet. We start the insertion-process at the root of the tree, and check at each level downwards which invokes need to be inserted. With this process, we insert the invoke-rules at the lowest point, while still being in scope of all rules that need it. Automatic rule ordering then positions the invokes at their appropriate places. A synthesized attribute |a| of child |c| is needed if there exists a rule which has the attribute reference |c:a| in its right-hand side. The needed attributes may differ per clause and visit, which we define in a similar way as |avail| in Section~\ref{sect.sem.rulercore}: \begin{code} need (visit x (many r) (many c)) = (sub need union) (many r) `union` (sub need intersection) (many c) need (clause x (many r) t) = (sub need union) (many r) `union` need t need (p = e) = need e \end{code} The actual implementation is a bit more complicated. A default-rule may indirectly express a need on an attribute (and corresponding visit). Furthermore, when a programmer provided an explicit invoke, we disable the generation of implicit invokes for that visit. Apparently, the programmer had a reason to explicitly invoke the visit, and we rather warn the programmer when he did not provide this invoke in, for example, another branch where it is also needed. \begin{figure}[htp] \begin{minipage}{0.58 \linewidth} \small{\begin{tabular}{ll} source node & target node \\ \hline begin visit & end prev visit \\ end visit & begin visit, clauses, syn attr def. rules, match rules \\ clause & begin visit \\ any rule & begin visit or clause \\ invoke rule & prev invoke or child, inh attr def. rules \\ any rule w. rhs & begin visit for inh attrs, loc attrs def., invoke of attr \\ \end{tabular}} \end{minipage} \hfill \begin{minipage}{0.38 \linewidth} \includegraphics[scale=0.2]{sum.pdf} \end{minipage} \caption{Dependencies between |rulerfront| entities.} \label{fig.dependencies} \end{figure} \subsection{Rule Ordering} To order the rules, we first create a dependency graph. Per semantics-block, in this graph, we have a begin and end node for each visit in the interface. For each clause and each rule we also have a node. Figure~\ref{fig.dependencies} lists what directed edges there are between these nodes, and gives a rough impresses what the dependency graph for the sum example in Figure~\ref{fig.syntax.example} looks like. The ovals represent attribute definitions, the square boxes represent begin and end of visits, clauses, and invocations. If this dependency graph is cycle-free, the rules can be ordered. The graph represents a partial order. We turns this order into a total order by giving attribute definitions precedence over invoke rules, invoke rules precedence over clauses, and otherwise take the lexical order. This is important, as it keeps the child-order stable, orders rules common to all clauses to be evaluated once (if possible) before selecting clauses, and orders match-rules up front in clauses. A topological sort over the graph gives a rank for each node. We move rules down the clauses-tree to the highest-ranked visit or clause with a rank less than the rule. To move a rule down a visit, we duplicate it to all branches. Sometimes, rules can be scheduled to more than one visit. The exact visit a rule ends up influences the amount of data that has to be transported between visits. The inputs for the rule need to be transported to the visit of the rule, and the result of the rule to the visits where these results are used. With the above algorithm, rules are scheduled to the earliest possible visit. Scheduling to the last possible visit seems more intuitive: we only compute attributes at the moment it is really going to be used. With a small preprocessing step, we can implement the as-late-as-possible scheduling strategy. Starting from the sinks of the graph, we preform a reversed depth-first search, marking each node with the last end-visit-node encountered (without overwriting a mark once assigned). This gives us for each rule the latest possible visit, and we add a dependency of this rule on the begin of that visit to the graph. If the graph is acyclic, it stays acyclic. The implicit invokes and automatic ordering allow a straightforward transformation from a datasem-block to a sem-block. Essentially, a datasem-block is syntactic sugar for a number of clauses, each with a match-rule, and a number of child-rules. We also allow in |rulerfront| omitted visits and clauses to be implicitly defined. Combined with implicit invocations, this makes it easy to add additional visits to an interface. Furthermore, the automatic rule ordering allows us to write independent rules separately from each other (possibly in separate files) and use a preprocessing step to merge the rules together. %% This particular %% feature is supported by UUAG \section{Discussion} \label{sect.eval} The expressiveness of |rulerfront| depends on the host language. However, given a host language that supports only global storage, and the two boolean constants, then we can still express general recursion and branching. In principle, |rulerfront| is turing complete. In practice, we use |rulerfront| as an alternative for attribute grammars, and in particular for traversals over tree-like data structures. To a limited extend it may be applicable to graphs traversals that are technically tree traversals (such as a traversal over a depth-first forest). Loops and iteration can be expressed in some extend via recursion. In related work, we expressed these by iterating visits~\cite{Middelkoop10gpce}). |rulerfront| is not suitable to express traversals over drastically changing data structures. We implemented several extensions to make |rulerfront| more expressive. One extension is the notion of internal visits. A conventional visit is invoked externally by the parent, and can choose a clause. This means that we can only conditionally compute attributes once per visit. In contrast, an internal visit is invoked at the end of the clause, and is not visible externally. An internal visit may again have clauses, and these clauses may again specify an internal visit as next visit, or a conventional visit. With this relatively simple extension, we can arbitrarily often branch inside a visit. Furthermore, we integrated demand-driven attribute evaluation. With demand-driven evaluation, certain circular grammars may still produce values for attributes: those that have no runtime circular dependencies. We either use statically ordered evaluation of a visit, or demand-driven one. A demand-driven visit may not use side effect, nor have multiple clauses. With visits, we can adequately model this combination. In the Haskell-version of |rulerfront|, we require type signatures for attributes. In |Javascript|, instead of type signatures, the notion of a type signature is a dynamic check in the form of assertion-functions that validate the values for attributes. %endif \section{Related Work} \label{sect.relatedwork} Related to this paper are various visitor-like approaches and attribute grammar techniques. The purpose of the Visitor design pattern~\cite{DBLP:conf/ecoop/GammaHJV93} is to decouple traversal operations from the specification of the tree to be traversed, in order to make it easier to add new operations without changing the existing specification of the tree. This allows us to write a multi-visit traversal using a separate visitor per traversal. In Section~\ref{sect.example.vis}, we discussed advantages and disadvantages of modeling traversals with this pattern. In particular, side effect is permitted, and used to store results for use in later visits. The side effect makes it hard to predict if results needed in a next visit are actually stored by a first visit. This is a fundamental problem of visitors. Oliveira, et al.~\cite{DBLP:conf/oopsla/OliveiraWG08}, for example, show many enhancements with respect to the type safety of visitors, but do not address the transfer of results between visits. %% Multi-methods~\cite{DBLP:conf/oopsla/ChambersL94} are supposed to replace %% the visitor pattern. A multi-method allows overloading of methods on %% multiple parameters, and makes |accept|-methods superfluous. %% This, however, is orthogonal to the problems and solutions presented in this %% paper. Attribute grammars~\cite{DBLP:journals/mst/Knuth68,DBLP:conf/waga/Knuth90} were considered to be a promising implementation for compiler construction, but several success stories aside, did not meet these expectations~\cite{DBLP:conf/waga/Waite90}. The bets may be turning again. %if fullVersion We are not worried about every byte of memory consumption anymore. Instead, multi-core processor utilization becomes an issue, and parallel evaluation of AGs was well studied in the past~\cite{629079,DBLP:conf/waga/KuiperS90}, but not yet applied in the presence of multi-core processors. The work in this paper is orthogonal to results in those areas. %endif Recently, many Attribute Grammar systems arose for mainstream languages, such as Silver~\cite{DBLP:journals/entcs/WykBGK08} and JastAdd~\cite{DBLP:conf/oopsla/EkmanH07} for Java, and UUAG~\cite{uuagc} for Haskell. In contrast to the work in this paper, these systems strictly discourage or even forbit the use of side effect. The design of |rulercore| is inspired by the language of execution plans of UUAG. In certain languages it is possible to implement AGs via meta-programming facilities, which obliviates the need of a preprocessor. Viera, et al.~\cite{DBLP:conf/icfp/VieraSS09} show how to implement AGs into Haskell through type level programming. The ideas presented in this paper are orthogonal to such approaches, although the necessary dependency analysis may be difficult to express depending on the expressiveness of the meta language. Several attribute grammar techniques are important to our work. Kastens~\cite{DBLP:journals/acta/Kastens80} introduces ordered attribute grammars. In OAGs, the evaluation order of attribute computations as well as attribute lifetime can be determined statically, allowing severe optimizations. %if fullVersion Boyland~\cite{DBLP:journals/toplas/Boyland96} introduces conditional attribute grammars. In such a grammar, semantic rules may be guarded. A rule may be evaluated if its guard holds. Evaluation of guards may influence the evaluation order, which makes the evaluation less predictable. In comparison, in our clauses-in-visits model, we have to explicitly indicate in what visits guards are evaluated (the match-statements of a clause), which makes evaluation clear. Our approach has the additional benefit that children may be conditionally introduced and visited. The work for this paper is carried out for a research project to support the implementation of the type inference component of the Utrecht Haskell compiler. In the workshop version of this paper~\cite{Middelkoop10vis}, we presented an earlier version of |rulerfront|'s clauses-per-visit model to allow attribute grammars to implement functions that perform case distinction on more than a single AST. In a later paper~\cite{Middelkoop10gpce}, we improved on this model to allow iteration of visits, and dynamic growing of trees, to model fixpoint construction of proof trees. That work was carried out using Haskell as target language. In this paper, we made an explicit connection between AGs and |rulerfront|, and use this connection to express side effect in AGs. %endif \section{Conclusion} \label{sect.conclusion} We introduced the language |rulerfront|, an extension of Attribute Grammars that makes visits to nonterminals explicit. As a consequence, it is possible to use side effects in rules. It combines the freedom of visitors as described by the Visitor Design Pattern with the convenience of programming with attributes, as shown in Section~\ref{sect.example}. Moreover, we presented |rulercore|, a subset of |rulerfront|, which serves as a small core language for visitor-based Attribute Grammars. In |rulercore|, the lifetime of attributes is explicit, as well as the evaluation order of rules and visits to children. A |rulercore| program has a straightforward translation to many languages. In Section~\ref{sect.trans.rulercore}, we showed a translation to |Javascript| . %if fullVersion Furthermore, we described how |rulerfront| programs are mapped to |rulercore| in Section~\ref{sect.trans.rulerfront}. %endif There are many directions for future work. The parallel evaluation of Attribute Grammars received a lot of interest in the past, but during a time that multi-core processors were not commonly available. The small |rulercore| language is suitable for experimentation with different evaluation strategies. Another direction of research is to allow destructive updates on attributed trees. For example, to support event-handling traversals over data structures that are dynamically changed based on user input or external events. In |rulerfront|, the visits performed on an attributed tree explicitly specify which attributes are defined. When we apply a destructive update to the tree, we thus know precisely what information is based upon the previous structure of the tree, which is beneficial when reasoning about mutations to the tree. Incremental evaluation of Attribute Grammars received attention in the past, and may be used to efficiently recompute attributes after an AST change. %% AVL trees. More fundamentally, the idea of this paper is to deal with the scheduling of rules in the presence of side effect. This is not possible with conventional attribute grammars, because the effects are not visible in attribute dependencies. In the Haskell version of |rulerfront|, the left-hand side of a rule can be a match against a data constructor. If this data constructor is a GADT, the match brings type assumptions in scope, to be used to coerce types in rules that follow. Similarly to side effect, these type assumptions are implicit. However, with |rulerfront|, we can explicitly schedule rules to be after such a match. This allows us to combine GADT features with Attribute Grammars. This may be sufficient to target dependently-typed programming languages, and a direction towards verified compilers using AGs. \ack This work was supported by Microsoft Research through its European PhD Scholarship Programme. \bibliographystyle{entcs} \bibliography{references} \end{document}