\documentclass[preprint,authoryear]{sigplanconf} %let appendices = False %let acknowledgements = False %include lhs2TeX.fmt %include polycode.fmt \usepackage{amsmath} \usepackage{natbib} \usepackage{listings} \usepackage{float} \usepackage{pgf} \usepackage{tikz} \usepackage{color} \usepackage{xcolor} \usepackage[english]{babel} \usepackage{mathtools} \usepackage{stmaryrd} \floatstyle{ruled} \restylefloat{figure} \setlength{\fboxsep}{1.4pt} \lstset{ language=Java, basicstyle=\small\sf, showstringspaces=false, numbers=none, morekeywords={var,sealed,dynamic,function}, tabsize=2 } \usepackage[ breaklinks, colorlinks , pdftitle={Functional Instrumentation of ActionScript Programs} , pdfsubject={Instrumentation} , pdfkeywords={instrumentation,actionscript,flash,EDSL} , pdfcreator={LaTeX with Lhs2TeX} , pdfproducer={pdflatex} , pdfauthor={Arie Middelkoop} , linkcolor=black , citecolor=black , filecolor=black , urlcolor=black ]{hyperref} \usetikzlibrary{shapes,arrows,fit,decorations,positioning} %include instr-icfp.fmt \renewcommand{\hscodestyle}{\normalsize} \newenvironment{smallcode}[0]{\renewcommand{\hscodestyle}{\small}}{\renewcommand{\hscodestyle}{\normalsize}} \begin{document} \conferenceinfo{ICFP '11}{September 19-21, Tokyo.} \copyrightyear{2011} \copyrightdata{[to be supplied]} \titlebanner{Submitted to ICFP 2011} % These are ignored unless \preprintfooter{A Functional Language for Action Script Instrumentation} % 'preprint' option specified. \title{Experience Report: Functional Instrumentation of ActionScript Programs} \authorinfo{Arie Middelkoop \and Alexander B. Elyasov \and Jurriaan Hage \and Wishnu Prasetya} {Universiteit Utrecht} {\{ariem,elyasov,jur,wishnu\}@@cs.uu.nl} \maketitle \begin{abstract} In log-based testing, the system under test contains code to log aspects of the system's execution. Such code can be injected automatically with program transformations. Still, manual intervention is needed to identify execution steps of interest, as well as projections of the system's state. With aspect-oriented programming (AOP), we can describe the semi-automatic transformation of the system. However, AOP languages typically lack abstraction mechanisms. In this paper, we show that well-known techniques from functional programming facilitate the design of an expressive AOP EDSL for ActionScript, called Asil. \end{abstract} \category{D.2.5}{Testing and Debugging}{Tracing} \terms measurement \keywords instrumentation, execution traces \section{Introduction} \label{sect.introduction} The \emph{future internet}~\citep{DBLP:journals/fi/Hudson-Smith09} challenges traditional testing approaches, as future internet applications are a dynamic composition of ever-changing third party components and services. In addition to conventional unit and regression tests during development, continuous testing is required even after deployment, which can be accomplished through log-based testing. In log-based testing, the execution of the system under test (SUT) is logged and later analyzed to discover anomalies that require investigation by software testers. In this paper, we use functional programming techniques for the design and implementation of a tool suite that facilitates the injection of logging code into client-side web applications. Our use case is Habbo Hotel~\citep{habbo}, which is a large Flash application deployed on the Internet. Therefore, we restrict ourselves to programs written in ActionScript 3, the programming language of Flash. From the execution of this Flash program, we wish to obtain a log, which is an execution trace that records a projection of the program's state for each execution step. To manually add logging to the source code of the SUT is undesirable, as it is a crosscutting concern that easily clutters up the original code. Moreover, modifications to the run-time environment are generally not possible due to security restrictions on the clients that run the application. %% Moreover, it is quite likely that at different times, we want to log different %% information from different locations for the same program. Fortunately, other techniques exist that help us deal with this issue more effectively. In particular, logging is a prime example of a functionality that can be implemented non-intrusively through aspect-oriented programming (AOP)~\citep{DBLP:conf/ecoop/KiczalesLMMLLI97}. This allows us to describe the instrumentation separately from the actual source code. In order to change the actual logging, we only need to change the instrumentation, and not the code to which instrumentation was applied. In AOP, a \emph{join point} is a location in the program's code to which we may transfer control. A join point has a static and dynamic context, which is defined as the reachable program state at that point. The \emph{advice} is code that can inspect and alter this state when executed at a join point. Typical AOP languages, such as AspectJ~\citep{Kiczales:2001:OA:646158.680006}, offer facilities for specifying \emph{point cut}s: a particular set of join points, where a given piece of advice should be executed. An \emph{aspect weaver} integrates the advice into the actual program at such join points via static and/or dynamic program transformation. Unfortunately, typical AOP languages offer limited means of abstraction in the join point model. For example, in AspectJ, point cuts can only be parameterized over \emph{symbolic values}. A symbolic value is a statically unknown value that corresponds to a \emph{run-time value} in the state of the SUT during its execution. Due to our strongly functional background, where the use of higher-order functions is second nature, we would like point cuts to be first class in the AOP specification. From a functional perspective, an AOP description can be viewed as a partial function that provides advice for some of the join points. In the case of logging, advice is a state transformer that additionally yields a sequence of events. Such an event typically contains some projection of the program's state, and an execution trace is then the sequence of such events as they arose during the SUT's execution. \begin{code} Spec = Instrumentation Instrumentation = JoinPoint -> Maybe Advice Advice = World -> (World `times` [Event]) \end{code} The advice functions are typically written in the programming language of the SUT. The instrumentation functions are point-cut descriptions in the AOP language, which is typically some pattern language. For coverage analyses (which exploit the information contained in logs) we typically want to specify a more fine-grained instrumentation depending on, e.g., the nesting depth of branches, or annotations provided by the user. To log the behavior of external services, we often want to specify the instrumentation of the invocation of the service as a function of (part of) the specification of the external service. Therefore, in this paper, we deviate from standard usage by parameterizing AOP specifications over values |alpha|. These may include symbolic values and even AOP specifications themselves. \begin{code} Spec alpha = alpha -> Instrumentation Coverage = Spec ControlFlowGraph External = Spec ServiceContract \end{code} Consequently, such abstractions allow us to write more modular specifications. From the point of view of the programmer this leads to a specification of the following type: \begin{code} Spec alpha = alpha `times` JoinPoint `times` World -> Maybe (World `times` [Event]) \end{code} In other words, in an instrumentation specification a programmer may employ values of an arbitrary type Haskell |alpha|, a point cut, and information that will be only be available at run-time. For an example of the former, consider a control flow graph that we have previously computed for the SUT, and that we use to \emph{generate} specifications of point cuts, and of advice to be weaved in. In other words, we can generate the AOP specification based on values and (higher-order) functions living in the Haskell world. Although the specification by the programmer may refer to information that will be available at run-time, this information is not yet available during the early stages that transform Asil code (a DSL deeply embedded in Haskell) into AsilCore (our core instrumentation language), or that weave in the advice. From this point of view, the specifications are actually a bit more restrictive than the type suggests. The following type is a closer match. \begin{code} Spec alpha = alpha -> JoinPoint -> Maybe (World -> (World `times` [Event])) \end{code} This type makes explicit that run-time information, contained in values of type |World|, cannot be used during the first two stages of translation. Indeed, run-time information may only be used in ways that can be correctly translated into AsilCore or ActionScript. For example, in the specification given by the programmer, a value that will only be known at run-time may not be scrutinized in a case statement, because AsilCore does not have the concept of a case statement, and therefore it must be translated away. Of course, this restriction does not hold the other way around: instead of weaving in advice at join points in the point cut, one may decide to weave in advice at every join point that dynamically decides whether the joint point is an element of the point cut. For reasons of efficiency, however, this is not a very wise thing to do. We point out that the ``arrows'' in the above type correspond exactly to code transformations: the first arrow transforms Asil into AsilCore, the second arrow is the process of weaving the advice into the ActionScript source (the tool \emph{Asic}), and the final arrow (inside the |Maybe|) refers to the run-time execution of the advice. % As noted before, each function arrow in the above type corresponds to a % distinct language. % For simplicity, advice is a simple method call to conventional ActionScript code, where we % assume the existence of a support library that contains methods to, e.g., serialize % events. \begin{figure}[htb] \centering \begin{tikzpicture} [ item/.style={rectangle, draw=black, fill=none, thick, font=\normalsize, node distance=15mm} , inv/.style={draw=none,fill=none,font=\scriptsize} , annot/.style={inv, node distance=4mm} ] \node[item](sut){SUT}; \node[item, right of=sut, xshift=15mm](sut2){SUT'}; \node[item, right of=sut2, xshift=15mm](trace){Trace}; \node[item, below of=sut, yshift=6mm](sym){Refl}; \node[item, below of=sut2,yshift=6mm](core){Core}; \node[item, below of=core, xshift=-15mm, yshift=7mm](spec){Spec}; \node[item, right of=core, xshift=5mm](support){Support}; \node[annot, above of=sut]{@.swf@}; \node[annot, above of=sut2]{@.swf@}; \node[annot, above of=trace]{@.log@}; \node[annot, below of=sym]{@.hs@}; \node[annot, below of=spec]{@.hs@}; \node[annot, below of=support]{@.as@}; \draw[->] (sut.east) to node[midway,above,inv]{weave} (sut2.west); \draw[->] (sut.south) to node[midway,right,inv]{generate} (sym.north); \draw[->] (sym.south) to node[near end,above,inv]{import} (spec.west); \draw[->] (core.north) to node[midway,left,inv]{weave} (sut2.south); \draw[->] (support.north) to node[near start,above,inv]{import} (sut2.south); \draw[->] (sut2.east) to node[midway,above,inv]{exec} (trace.west); \draw[->] (spec.east) to node[near start,above,inv]{compile} (core.south); \end{tikzpicture} \caption{Overview of the specification artifacts.} \label{fig.overview} \end{figure} Figure~\ref{fig.overview} provides a somewhat more detailed view of the artifacts involved in the logging specification. From the SUT, we generate some reflection information Refl (symbol information, control flow graphs), which is imported by our logging specification Spec. From Spec, we generate AsilCore, which is weaved into the SUT together with the support library that provides functions for, e.g., serializing events. Finally, execution of the instrumented SUT generates the trace. Zooming in on the languages we employ, our instrumentation language is a combinator language called \emph{AsilCore}, that provides two primitive operations: pattern matching (on join points) and the invocation of advice. Such a language is not very easy to program in, which is why we have chosen to extend it to a deeply embedded Haskell DSL, called \emph{Asil}. Asil uses the well-known concepts of \emph{monadic}~\citep{DBLP:conf/afp/Wadler95} and \emph{alternative}~\citep{Mcbride:2008:APE:1348940.1348941} interfaces to compose instrumentations. The implementation of Asil is ongoing work within the Fittest project\footnote{\url{http://www.pros.upv.es/fittest/}}. The library @asil@\footnote{\url{http://hackage.haskell.org/package/asil}} contains Asil as a library, including parsers, pretty printers, and other tools for the transformation of ActionScript byte code. The tool @asic@\footnote{\url{http://hackage.haskell.org/package/asic}} generates reflection information and performs aspect weaving. The remainder of this paper is organised as follows. After introducing the running example in Section~\ref{sect.example}, we consider the Asil language in more detail in Section~\ref{sect.asil}. In Section~\ref{sect.asilcore} we show AsilCore and some aspects of its implementation; the aspect weaver asic is shortly discussed in Section~\ref{sect.asic}. Finally, in Section~\ref{sect.reflection} we reflect on the advantages and disadvantages offered by functional programming with respect to the implementation of Asil and Asic, and Section~\ref{sect.conclusion} concludes. \section{Example} \label{sect.example} In log-based testing, we log aspects of the program's execution, so that we can inspect the log to discover atypical execution patterns that may be worth an investigation by a software tester. This logging may take place in a special test environment, or after deployment, when the program is in active use. %% In the latter case, privacy is an important additional concern. In this paper, we consider only the instrumentation of Flash programs: the representation of the logs and the analysis of the logs are out of the scope of this paper. To set the scene, we describe a snippet of chess-like internet game as a simple running example (see Figure~\ref{fig:snippet}). The game proceeds in turns. In each turn, the user clicks on a square to select a pawn, then clicks on an unoccupied square to move the pawn there. Note that logging code has been explicitly inserted, logging each call to the @MyGame.clicked@ method (with position information) and also the actual moves that are performed. The calls to @Log.clicked@ and @Log.move@ take care of generating events that are logged and thereby become part of the trace. \begin{figure}[ht] \begin{small} \begin{verbatim} package code { public class MyGame extends Sprite { private var selSquare : Square; public function MyGame() : void { addEventListener("click", clicked); } function clicked(event:MouseEvent) : void { var x : int = event.localX; var y : int = event.localY; Log.clicked(x,y); var target : Square = getSquare(x,y); var taken : Boolean = occupied(target); if (!this.selSquare && taken) { this.selSquare = target; } else if (this.selSquare && !taken) { Log.move(this.selSquare, target); this.move(this.selSquare, target); this.selSquare = null; } } } } \end{verbatim} \end{small} \caption{ActionScript snippet with logging code.} \label{fig:snippet} \end{figure} An \emph{execution trace} is a sequence of events |e1 ... (sub e n)|, where an event |e = (t,i)| is a tuple consisting of a discrete timestamp |t| and information |i| about the state of the program at |t|. In case of the example, this information consists either of a |clicked| record or a |move| record. \begin{code} i ::= clicked(int,int) -- clicked record | move(code.Square,code.Square) -- move record \end{code} The utility methods @Log.clicked@ and @Log.move@ take care of raising the events and serializing a projection of the object-graph of their parameters to a log. The actual implementation of these utility methods is out of the scope of this paper. The calls to the logging methods obscure the source code. This gets worse when we make the logging code conditional for reasons of, e.g., privacy or performance: \begin{verbatim} if (Level.privacy <= 2 && Level.verbose >= 1) { Log.move(this.selSquare, target) } \end{verbatim} Logging and privacy are typical examples of crosscutting concerns, so that we can apply AOP techniques to specify these concerns separately from the code. In the next section we discuss the instrumentation language Asil, a functional AOP language. \section{The Asil Instrumentation Language} \label{sect.asil} The language |Asil| is a combinator language for writing down specifications |Spec alpha| as described in Section~\ref{sect.introduction}. It is a functional, strongly-typed, monadic Haskell EDSL for expressing \emph{advice} that is conditionally applied at \emph{join points}. Join points are predefined locations in the code where control can be transferred to. Figure~\ref{fig.joinpoints} lists the join points in our \emph{join point model}~\citep{Cazzola06semanticjoin}. Our main join points are the conventional method entry and exit from the caller and callee side as in AspectJ~\citep{Kiczales:2001:OA:646158.680006}, as well as sequential blocks of instructions~\citep{Juarez-Martinez:2008:JPM:1562423.1562447}. Operationally, an Asil program is a value of the type |I ()| (explained below) that is applied to each encountered join point during execution. \begin{figure}[htb] \centering \begin{tabular}{l||l} Join point & Context captured \\ \hline Method entry & name, parameters, and param types \\ Method exit & name, return value, and return type \\ Method abort & nane, and exception \\ \hline Method call & name, parameters, and param types \\ Method returned & name, return value, and return type \\ Method failed & name, and exception \\ \hline Block entry & id, cycle root, and preceding join points \\ Block exit & id, cycle root, and preceding join points \\ \hline Coercion call & input, input type, and intended type \\ Coercion returned & input, output, input type, and output type \\ Coercion failed & input, exception, and intended type \\ \end{tabular} \caption{Join points and their context.} \label{fig.joinpoints} \end{figure} We use combinators to construct monadic \emph{instrumentations} of type |I a| and (symbolic) \emph{expressions} of type |E a|. A typed product |a| of expressions is represented by the type |many E a|. Figure~\ref{fig.operations} lists the Asil combinators that are used in the example. Instrumentations have a notion of success: a successfully applied instrumentation of type |I a| returns a value of type |a|, otherwise the instrumentation failed to apply. The operations |matchEnter|, |matchCall|, and |matchEnter'| express matches against join points. A successful match returns join-point information. For example, after |matchEnter| we can access the values of parameters to the call (symbolically). The operation |call| invokes an ActionScript method, and returns the (symbolic) result value that is returned by that method. Method calls succeed by default, unless the method uses a special API routine to indicate an abort. It is important to realize that the values that will only be available at run-time will only be represented symbolically: we can manipulate them only to a limited extent. Such values are the atomic elements of our \emph{expressions} |E a| that may also involve built-in operations on these values, such asinteger arithmetic, comparisons, boolean arithmetic, field dereference, and array indexing. Essentially, a value of type |E a| represents a (symbolic) ActionScript value of type |a|. We use Haskell to build such expressions. Not all Haskell expressions (lambdas in particular) can masquerade as |E|-expressions, nor can |E|-expressions be scrutinized with Haskell's case expressions. The reason is that AsilCore does not support these types and expressions, and we chose not to provide translations for them at this time. We may do so in the future, but in this first attempt we prefer to introduce only those elements that are essential and to decide later what else we might need. Only a few Haskell \emph{values}, such as integers and strings, are also |E|-expressions. %% We purposefully impose this limitation (Section~\ref{sect.obstacles}). \begin{figure}[htb] \begin{smallcode} \begin{code} -- instrumentations (I) matchEnter :: Prop c (Method a b) -> I (Enter a) matchCall :: Prop c (Method a b) -> I (Enter a) matchEnter' :: I (Enter Any) call :: E (Method a b) -> (many (E)) a -> I (E b) guard :: E Bool -> I () onPrevious :: I a -> I a param :: Enter Any -> Int -> I (E Any) -- monadic combinators (>>=) :: I a -> (a -> I b) -> I b return :: a -> I a fail :: String -> I a -- alternative combinators (<#>) :: I a -> I a -> I a (<<#>) :: I a -> I a -> I a (<##>) :: I a -> I a -> I a -- expression combinators (E) (#) :: E c -> Prop c t -> E t embed :: I a -> E a null :: E (Object t) static :: E Static param1 :: Enter (a, r) -> E a param2 :: Enter (a, (b, r)) -> E b -- match context data Enter a = ME { name :: E String, params :: (many (E)) a } \end{code} \end{smallcode} \caption{Excerpt from the Asil combinators.} \label{fig.operations} \end{figure} The primitive operations are composed with \emph{monadic} and \emph{alternative} combinators, shown at the bottom of Figure~\ref{fig.operations}. The monadic bind |m >>= f| composes instrumentations sequentially, and parameterizes the continuation |f| with the values returned by |m|, if |m| succeeds. The monadic return |return x| is an always succeeding instrumentation that is effectively a no-op, but returns |x| as value. We use do-notation to conveniently write monadic sequences. In the following example, the operation |matchEnter| matches against the method entry (join) point of the @clicked@ method of @MyGame@, and then calls the @clicked@ method @Log@ with information that is extracted from the event-parameter (using |#|). %% using the |#| property dereference %% operator. \begin{code} instrLogClick = do m <- matchEnter kk~code~MyGame~clicked let evt = param1 m eX = evt # kk~flash~events~MouseEvent~localX eY = evt # kk~flash~events~MouseEvent~localY call (static # kk~code~Log~clicked) (eX, eY) return () \end{code} If the match succeeds, then the variable |m| contains information about the joint point. Different types of join points typically provide different information. In this case, we can use |param1| to obtain a typed reference |evt| to the first parameter of the method from |m|. From |evt| we can obtain the coordinates of the mouse click and pass these to the |Log~clicked| method. For static properties, we use the special name |static| as object reference. On the other hand, |matchEnter'| can match against method-entry join points that do not have a statically fixed name nor a statically fixed signature. For example, with the following code, we can match on any method-entry join point with a name that contains |"clicked"| as substring, and has a first parameter of the @MouseEvent@ type. \begin{code} do m <- matchEnter' guard (count (params m) == 1) guard (substr "clicked" (name m)) p1 <- param m 1 evt <- p1 `cast` tt~flash~events~MouseEvent ... \end{code} In this example, |params| obtains a reference to the parameters of the method from |m|. The |guard| instrumentation succeeds if and only if its Boolean parameter is |True|. Remember that these operators work on symbolic values |E t| instead of conventional Haskell values of type |t|. % Also, anonymous functions do not have a name, and are represented by a @null@ % reference. Similarly, we can instrument the call to @move@ in the method @clicked@. The operation |onPrevious| takes an instrumentation |p| and applies it to the joint points that (directly) precede in the control flow graph, the matched join point (ignoring back edges). The operation succeeds if |p| is successfully applied to such a preceding join point (at runtime). With this operation, we declaratively specify contextual pattern matches. \begin{code} instrLogMove = do onPrevious $ matchEnter kk~code~MyGame~clicked m <- matchCall kk~code~MyGame~move let from = param1 m to = param2 m call (nmclass # kk~code~Log~move) (from, to) return () \end{code} In the context of having matched against @clicked@, we match against a call to @MyGame.move@, and insert a call to @Log.move@. The instrumentations |instrLogMove| and |instrLogClick| are defined separately. Instrumentations can be combined with various \emph{alternative} operators. Given some continuation |f|, the meaning of |(p <##> q) >>= f| is that branch |p >>= f| and branch |q >>= f| are applied to the join point (in that order). The meaning of |(p <#> q) >>= f| is that |p| and |q| (in that order) are both applied, and the continuation |f| is parametrized with the result of the last succeeding one of the two. The strict alternative |p <<#> q| applies |q| only if |p| fails. The ability to fail raises the question how we deal with the fact that our support library contains conventional ActionScript methods that may exhibit side effects. At this time, we do not attempt to revert these effects. %% Note that this is possible when we would use software transactions. %% With parallel composition, we can combine a list of individual instrumentations %% into a single instrumentation. With conventional higher-order functions and parallel composition we combine the individual instrumentation. \begin{code} myInstr :: I () myInstr = foldr (<##>) (fail "initial") [ instrLogClick, instrLogMove ] \end{code} The |fail| operation is a unit of parallel composition. %% The |fail| operation explicitly aborts the application of an instrumentation. %% It is a unit to parallel composition. Having specified the complete instrumentation |myInstr| we can then interpret it for each join point in the Flash program according to the semantics that we specified informally in this section. However, we actually implemented an instrumentation monad such that its execution yields an AsilCore specification (Section~\ref{sect.asilcore}) as an intermediate step. How and why we do this is the subject of the next section. %% Todo: insert reason why we would want to take this partial evaluation step \section{The AsilCore Language} \label{sect.asilcore} %{ %% Format instructions restricted to this section only %% Might otherwise conflict with variable names in the Haskell blocks %format match = "\mathbf{match}" %format call = "\mathbf{call}" %format on = "\mathbf{on}" %format previous = "\mathbf{previous}" %format abort = "\mathbf{abort}" %format entry = "\mathbf{entry}" %format exit = "\mathbf{exit}" %format call = "\mathbf{call}" %format prim = "\mathbf{prim}" %format static = "\mathbf{static}" %format prop = "\mathbf{prop}" %format name = "\mathbf{name}" %format inputs = "\mathbf{inputs}" %format res = "\mathbf{res}" %format args = "\mathbf{args}" %format dynamic = "\mathbf{dynamic}" %format nop = "\mathbf{nop}" %format block = "\mathbf{block}" %format guid = "\mathbf{guid}" %format coerce = "\mathbf{coerce}" The monadic bind is an obstacle in the translation of Asil to ActionScript. The right-hand side of a bind is a function, thus it would require a nontrivial translation of Haskell functions to ActionScript. Therefore, we partially evaluate the monadic code to AsilCore, which is sequential code. This step eliminates both functions and monads. %% For more details... next section? \begin{figure}[htb] \begin{smallcode} \begin{code} i ::= match m -- match against join point |m| | call o (many v) (x :: ty) -- call with args |many v|, binds result |x| | on previous i -- lift to preceeding join points | abort -- aborts instrumentation | nop -- no-op | coerce v1 (x2 :: ty) -- coerce type of |v1| to |ty| | i1 ; i2 -- sequential composition | i1 <#> i2 -- parallel composition | i1 <<#> i2 -- alternative composition | static i -- error if |i| not statically resolved | dynamic i -- defers eval of prims in |i| m ::= entry (x1 :: String) (many (x2 :: ty)) -- name |x1|, args |many x2| | exit (x1 :: String) (x2 :: ty) -- name |x1|, result |x2| | call (x1 :: String) (many (x2 :: ty)) -- name |x1|, args |many x2| | ... -- etc. o ::= prim x -- primitive with the name x | static s -- static property named s | dynamic v1 v2 -- call closure |v1| with |v2| as @this@ v ::= x | c -- identifiers and constants \end{code} \end{smallcode} \caption{AsilCore abstract syntax.} \label{fig.asilcore} \end{figure} The AsilCore language (Figure~\ref{fig.asilcore}) is an intermediate language that can be mapped strightforwardly to ActionScript advice (to be precise, we map directly to AVM2 byte code). An AsilCore term has a statically fixed control flow graph, which simplifies the injection into compiled ActionScript, since it can be represented with conventional branching instructions of ActionScript. \begin{figure}[htb] \begin{smallcode} \begin{code} type I a = Uid -> -- unique id (Uid -> a -> Core -> (Core, Uid)) -> -- continuation (Core, Uid)) -- res. core run f = fst $ f 1 (\n _ i -> (i, n)) -- |I a -> Core| instance Functor I where fmap f g = \n k -> g n (\m v -> k m (f v)) instance Monad I where return x = \n k -> k n x nop m1 >>= f = \n1 k -> m1 n1 $ \n2 v1 i1 -> (f v1) n2 $ \n3 v2 i2 -> k n3 v2 (i1 ; i2) fail s = \n k -> k n (error s) abort instance Alternative I where empty = \n1 k -> k n1 undefined abort m1 <#> m2 = \n1 k -> m1 n1 $ \n2 v1 i1 -> m2 n2 $ \n3 v2 i2 -> k n3 v2 (i1 <#> i2) m1 <##> m2 = \n1 k -> let (i1,n2) = m1 n1 k (i2,n3) = m2 n2 k in (i1 <#> i2, n3) embed i = \n k -> k n () i -- |Core -> I ()| fresh = \n k -> k (n+1) (ESym n) nop -- |I (E a)| matchBlockEntry = do -- |I Block| vId <- fresh embed (match block guid (toCoreIdent vId)) return $ Block { blockGuid = vId } \end{code} \end{smallcode} \caption{Asil to Core execution.} \label{fig.execution} \end{figure} In the translation to AsilCore (Figure~\ref{fig.execution}), we implement the instrumentation monad in continuation passing style~\cite{Sussman75scheme} to deal with the |<##>| operator. The monadic binds are replaced with static sequencing. Even if Asil programs cannot use Haskell code to scrutinize E-values, we can parametrize a function |f| in the right-hand side of a bind with the (typically) symbolic value |v1| computed in the left-hand side, and continue partial evaluation. As a result, such a value |v1| will be restricted in how it can be manipulated at run-time. We desugar Asil programs a bit further to simplify the AsilCore representation. We express all branching with the local combinators |<#>| and |<<#>|. Values in AsilCore are either identifiers that symbolically represent ActionScript objects, or constants that represent concrete ActionScript objects. Matches introduce identifiers for the contextual values of the match, and calls introduce identifiers for the result value. Calls take values as argument, which are either constants or identifiers. An identifier may only be introduced once, and must be introduced before being used. This property holds for AsilCore programs that are derived from well-typed Asil programs. The following fragment shows the result of the translation of the running (Asil) example to AsilCore. The abstractions offered by Asil are seen to be expanded away. \begin{code} { match entry name x1 :: String args x2 :: flash.events.MouseEvent ; call prim equals args x1, "code.MyGame:clicked" res x3 :: Boolean ; call prim guard args x3 res x4 :: void ; call prim deref args x2, "flash.events.MouseEvent:localX" res x5 :: int ; call prim deref args x2, "flash.events.MouseEvent:localY" res x6 :: int ; call static "code.Log:clicked" args x5, x6 res x7 :: void } <#> -- composes the two instrumentations in parallel { on previous { match entry name x8 :: String inputs x9 :: flash.events.MouseEvent ; call prim equals args x8, "code.MyGame:clicked" res x10 :: Boolean ; call prim guard args x10 res x11 :: void } ; match call name x12 :: String inputs x13 :: code.Square, x14 :: code.Square ; call prim equals args x12, "code.MyGame:move" res x15 :: Boolean ; call prim guard args x15 res x16 :: void ; call static "code.Log:move" args x13, x14 res x17 :: void } <#> abort -- may be optimized away \end{code} The primitive |deref| takes an object and a fully qualified name as string, and returns the associated property if it exists, otherwise it fails. %% \footnote{ %% Private and protected fields of such an object may not be accessible. %% As part of the code transformation, we inject accessor functions that can %% do so. However, we refrain from transforming the standard library. %% }. To call a property of an object, |deref| can be used to obtain a closure of type |Function|, followed by a |call dynamic|. The primitive |guard| succeeds if and only if its argument is true, and the |equals| primitive has the same semantics as the ActionScript @==@. The declarative |on previous| instruction lifts its block to all preceding join points in the static control flow graph of the method. Operationally, it also introduces an additional local boolean that keeps track whether the instrumentation was applied. If it was applied, we can access its values via the local variables that the lifted instrumentation introduced. %{ %format acall = "\Varid{call}" \begin{figure}[htb] \begin{smallcode} \begin{code} <# abort #> ~> thisJoinPoint.aborted = true <# nop #> ~> ; <# call o (many v) (x :: ty) ~> <# x #> = (ty) (<# o #> (many (<# v #>))); <# match exit (x1 :: String) (x2 :: ty2) #> ~> thisJoinPoint.aborted |= thisJoinPoint.isMethodExit; if (not thisJoinPoint.aborted) { <# x1 #> = thisJoinPoint.nm; <# x2 #> = thisJoinPoint.ret } <# i1 ; i2 #> ^^^ ~> <# i1 #>; if (not thisJoinPoint.aborted) { <# i2 #> } <# i1 <#> i2 #> ^^^ ~> <# i1 #>; thisJoinPoint.aborted = false; <# i2 #> <# i1 <<#> i2 #> ^^^ ~> <# i1 #>; if (thisJoinPoint.aborted) { thisJoinPoint.aborted = false; <# i2 #> } <# prim x #> as ^^^ ~> Primitives.<# x #> (as) <# static C:x #> as ^^^ ~> <# C #> . <# x #> (as) <# dynamic v1 v2 #> as ^^^ ~> v1.acall(v2, as) \end{code} \end{smallcode} \caption{AsilCore to ActionScript translation.} \label{fig.translation} \end{figure} %} The obtained AsilCore instrumentation can be mapped to ActionScript code as defined in Figure~\ref{fig.translation}, and can then be weaved in at join points in the program. This requires an ActionScript method for each primitive operation, and we need to construct a reflexive @thisJoinPoint@ value~\citep{Hilsdale:2004:AWA:976270.976276} at each join point. For a typical join point, only a small part of the instrumentation is applicable (or even none at all). Therefore, our instrumentation weaver Asic (Section~\ref{sect.asic}) partially evaluates the AsilCore term, and only weaves in the residual term. %% We considered a variation %% that applies the block to all join points in order to encode AspectJ's %% dynamic |cflow| pattern. However, we can already achieve such inter-procedural %% effects by maintaining stacks with information in the methods of the support %% library. %} \section{The Instrumentation Weaver Asic} \label{sect.asic} %if False The Asic tool takes an AsilCore term and applies it to all join points of the SUT. This involves yet another partial evaluation step. The following pipeline summarizes the instrumentation process. Each box represents a partial evaluation phase, and corresponds to the function arrows as mention in Section~\ref{sect.introduction}. \begin{center} \begin{tikzpicture} [ tool/.style={rectangle, draw=black, fill=none, very thick, font=\normalsize, node distance=18mm} , inv/.style={draw=none,fill=none,font=\scriptsize, node distance=17mm} ] \node[inv](before){}; \node[tool, right of=before](hs){|hs|}; \node[tool, right of=hs](asic){|asic|}; \node[tool, right of=asic](flash){|flash|}; \node[inv, right of=flash](after){}; \node[inv, below of=asic, node distance=10mm](under2){}; \node[inv, below of=hs, node distance=10mm](under1){}; \draw[->] (before.east) to node[midway,above,inv]{|asil|} (hs.west); \draw[->] (hs.east) to node[midway,above,inv]{|core|} (asic.west); \draw[->] (asic.east) to node[midway,above,inv]{|SUT'|} (flash.west); \draw[->] (flash.east) to node[midway,above,inv]{|trace|} (after.west); \draw[->] (under1.north) to node[near start,inv,xshift=3mm]{|sym|} (hs.south); \draw[->] (under2.east) to node[near start,inv,xshift=5.5mm]{|support|} (asic.south); \draw[->] (under2.west) to node[near start,inv,xshift=-3mm]{|SUT|} (asic.south); \end{tikzpicture} \end{center} We partially evaluate an Asil expression via conventional Haskell execution using GHC to a AsilCore term, the AsilCore code is partially evaluated by AsilCore, and final evaluation is performed by the Flash environment. The JIT compiler of Flash takes care of optimizing the additional bytecode that is introduced by Asic. %endif The Asic tool takes an AsilCore term and applies it to all join points of the SUT. This involves yet another partial evaluation step. The success of a match is always resolved statically, and results in static bindings for identifiers related to the name of a method, unique id of a block, and possibly also the types and number of arguments. Calls to primitive methods may be performed statically, depending on what is known statically about their parameters. Reachable calls to non-primitive methods always result in a residual call in order to cater for possible side effect. AsilCore terms can be large when the instrumentation is \emph{generated} from, e.g., the control-flow graph. For example, there may be a dedicated AsilCore branch for each method in the SUT. Asic applies several optimizations to the AsilCore term before and after the evaluation process. For example, for each AsilCore branch, we collect the static constraints on the join points in a trie structure, such that we can quickly obtain the branches that potentially match. % This is a typical example of unification. %%%% well... it's similar to typical matching problems %% The declarative |dynamic| instruction can be used to force evaluation to %% the actual execution time of the PUI. This is needed in combination with the %% |typeof| method that can always be done statically (because |core| values are %% typed), but yield more specific information during the PUI's execution. %% Non-primitive calls are never performed statically. Aspect-weaving is an instance of syntax-directed translation. Attribute Grammars (AGs)~\citep{uuagc} provide a declarative language to write composable implementations of such translations. Hence, we implemented Asic with AGs. An advantage of the approach, compared to a monadic one, is that aspects of the translation can be described seperately, e.g., the threading of environments, location information and substitutions. %% Comparison to monads? \section{Reflection} \label{sect.reflection} \paragraph{Embedding.} Haskell provides Asil's abstraction facilities for free. Also, we piggy back on Haskell's type system to type check Asil specifications. For the syntactic sugar related to E-expressions and function calls we used GADTs and common type class extensions. The use of type classes was not always intuitive due to confusing conflicts between overlapped instances and functional dependencies. %% To deal with (rather confusing) conflicts between overlapped instances and functional dependencies %% we needed type-class hackery. %%% ~\citep{Kiselyov:2004:STH:1017472.1017488}. In our embedding, join-point structures and the SUT's runtime values are represented by E-expressions. The syntax of E-expressions is purposefully limited so that we can use Haskell's abstraction facilities, but refrain from embedding arbitrary Haskell expressions in AsilCore and ActionScript. \paragraph{Executable specifications.} Specifications of Flash and the ActionScript bytecode format are publicly available. However, we discovered that these specifications are incomplete with respect to new Flash versions, and also contain errors that are hard to track down. Therefore, we initially implemented a monadic binary parser with @uuparsing-lib@~\citep{Swierstra:2009:CPS:1614478.1614484}, which provides results lazily, in combination with error correction. This provided us with sufficient context to find bugs in our own implementation and the specification itself. Our parser can actually serve as a formal specification, in an executable, testable form. \paragraph{Performance.} Our use case, the Habbo application, is a large SWF file that consists of 3,000 classes with 25,000 methods and 800,000 instructions. Simply parsing the application takes several seconds, which is surprisingly long despite the size of the application. We reimplemented our parser as a binary deserializer using the @Data.Binary.Get@ to no avail. Profiling showed us that garbage collection takes about 60\% of the time. Our hypothesis is that this is an interplay between the large and relatively long lived AST that we construct, and the short-lived closure-garbage produced by the parsers. If it would be possible to allocate the AST (via an annotation on the data constructors) directly in an older memory generation, we expect it to improve our parser's performance. A modification of the parser that only verified the structure of the SWF files but does not construct the AST takes about half the time and seems to confirm our hypothesis. A more pressing problem is that GHC crashes on the huge symbol file that we generate from the Habbo SWF. We consider generating a separate Haskell module for each ActionScript class. \paragraph{Proofs.} The structure of Asil is based on the functor, monad, applicative and alternative interfaces~\citep{Mcbride:2008:APE:1348940.1348941}. This allows us to immediately use many of the convenient functions that are written in terms of these interfaces. Also, these interfaces have to satisfy satisfy a number of laws. Proving that these laws hold gives us more confidence and insight in our implementation. For example, our monad (Figure~\ref{fig.execution}) satisfies the monad laws semantically, but not structurally, since the monadic |return| introduces no-ops. In our actual implementation, we directly eliminate superfluous no-ops and trivially-dead branches as optimization. %if False With typed abstract syntax, we can represent functions in Haskell as data type. These can be mapped on top of ActionScript objects. With an approach like \emph{Type Safe Observable Sharing in Haskell}, we can lift Haskell functions to ActionScript. However, general case distinction is still not possible. Alternatively, we could use UHC to generate ActionScript code, as Atze showed on the UHC blog. Or template haskell with reification. %%%%% Idea: Log Based Testing of Asynchronous Systems %%%%% http://www.replicator.org/content/idea-log-based-testing-of-asynchronous-systems %endif %if False \section{Related Work} \label{sect.relatedwork} With respect to language design, a discussion with AspectJ would be nice. AspectJ lacks abstraction: you can write parametrized point cuts, but these parameters are symbolic program values where no case distinction can be performed on. Neither are "higher-order" pointcuts possible. This is where functional programming can help us. %endif \section{Conclusion} \label{sect.conclusion} We presented Asil, an expressive AOP EDSL for the instrumentation of ActionScript programs. Its design and implementation is based on well-known functional programming techniques, such as monads. The implementation exploits Haskell's facilities for abstraction, syntactic sugar and static typing. Like AspectJ, Asil allows point cuts to be specified with constrained patterns. In addition, Asil allows these constraints to be specified as runtime ActionScript methods and possibly instrumentation-time primitive functions. Finally, (higher-order) functions can be used to abstract over instrumentations and to derive instrumentations from others. The possibility to write higher-order functions is an essential feature that we sorely miss in many DSLs. Fortunately, this comes for free for EDSLs in functional languages. %if acknowledgements %% Can possibly also be moved as footnote to the first page of the paper \acks This work was carried out in terms of the Fittest project. %% Todo: insert more appropriate acknowledgements %endif \bibliographystyle{abbrvnat} \bibliography{references} %if appendices \appendix \section{Implementation} \label{sect.implementation} A bit more insight in the structure of the tools. Some specifics here, such as the use of Haskell, Attribute Grammars. Perhaps mention that the parser written with parser combinators is a more complete specification of the ActionScript bytecode grammar than the actual document. The actual partial evaluation done by |asic| proceeds in two steps. As a first step, we pre-evaluate and optimize the |core| term. We associate a symbolic value with each identifier and execute the primitive calls that can run already, which leads to concrete bindings for some of the identifiers. We substitute primitive calls that failed with |abort|. We reorder some instructions to the front, under the restriction that no instruction that can potentially fail may shift over a non-primitive call. This ensures that the observable order of side effect is retained. Also, an instruction can only be shifted over another instruction if it does not break def-use dependencies. Under these restrictions, we try to order match instructions to the front, followed by |on previous| instructions, and finally the calls. We also replace a sequences of the form |abort s; m| with |abort s|, and |m <#> abort s| and |m <<#> abort s| with |m| (and the symmetric cases as well). Finally, we left-factor the result. The |core| term in this form allows us to a trie structure to quickly determine what top-level branches of a |core| term match a join point. This is important as the PUI may have hundreds of thousands of join points, of which only a small portion have an applicable instrumentation, which is only a small Instead of manually adding the calls to the logging methods in the source code, we develop on a tool |asic| that injects the logging code in the compiled Flash program. We call this Flash program the Program Under Instrumentation (PUI). The code to inject is determined by a program of the domain specific, functional programming language |asil| (an EDSL in Haskell) that describes what code is to be inserted at which locations. %% %% Should not give too much argumentation here (save that for introduction) %% An |asil| expression describes an \emph{instrumentation} to be performed at \emph{join points} of the PUI. The |asic| tool generates a Haskell library from the PUI that contains symbol information of types, properties and methods that occur in the PUI. This symbol information allows the Haskell compiler to type check the instrumentation. For example, the @clicked@ method lives in class @MyGame@ of package @code@, hence the fully qualified name for this method is @code.MyGame.clicked@, and the transcribed Haskell identifier |k~code~MyGame~clicked|. This symbol actually stores the name of the property, and has a type that describes the method's signature. \begin{code} type TT~code~MyGame~clicked = TT~flash~events~MouseEvent -> TT~void kk~code~MyGame~clicked :: TT~code~MyGame `Key` TT~code~MyGame~clicked kk~code~MyGame~clicked = K "code.MyGame:clicked" \end{code} %endif \end{document}