\section{Specification} \label{sect.typesystem} In this section, we present our specification for GADT inference. We start with the language \SystemFAI{} in Section~\ref{sect.typesystem.language}, present \SystemFAI's typing rules in Section~\ref{sect.typesystem.rules}, and show how to deal with equality proofs in Section~\ref{sect.typeconversions}. \begin{CenterFigure}{h}{Syntax of \SystemFAI}{fig.systemfa} \rule{\linewidth}{0.1pt} \begin{code} e, f, g, a `elem` Expr ^^^^^ ty, rho, varrho, sigma `elem` Type ::= x ^^^^ EVAR ^^^^^ ::= alpha ^^^^ TVAR ||| tau =*= sigma ^^^^ EEQ ^^^^^ ||| sigma -> ty ^^^^ TARR ||| \x . e ^^^^ ELAMABS ^^^^^ ||| forall alpha . ty ^^^^ TFORALL ||| \alpha . e ^^^^ EUNIVABS ^^^^^ ||| tau =*= sigma ^^^^ TEQS ||| \(tau =*= sigma) . e ^^^^ EEQABS ^^^^^ ||| f e ^^^^ EAPPEXPR ^^^^^ env `elem` Env ||| f ty ^^^^ EAPPUNIV ^^^^^ ::= env, x :: ty, ^^^^ ENVTY ^^^^^ ||| env, alpha ^^^^ ENVVAR x, y `elem` Ident ^^^^^ alpha, beta `elem` TyIdent \end{code} \rule{\linewidth}{0.1pt} \end{CenterFigure} \subsection{\SystemFAI} \label{sect.typesystem.language} \Fig{fig.systemfa} lists the syntax of \SystemFAI, which consists of System F with the addition of equality proofs. These additions consist of: \begin{itemize} \item The abstraction |\(ty =*= sigma) . e| matches against an equality proof for |ty =*= sigma|, and brings it in scope by adding it to the environment. The equalities in scope are called equality assumptions, and can be used in equality proofs. The process of constructing a proof is fully automatic. \item The proof expression |(ty =*= sigma)| constructs an equality |ty =*= sigma|. \item The type language contains a type for equality proofs. Furthermore, we assume a number of primitive types to be present in the type language, such as |Int|. \end{itemize} Environments record Skolem constants introduced by type abstraction, and types for identifiers. Since we present equality proofs as conventional types, these can be stored in the environment as well, when we give them a name. As illustration, we encode the following fragment of a program that generically increments integers in data types: \begin{code} data Rep alpha where RInt :: Rep Int inc :: Rep alpha -> alpha -> Int inc = \r . \x . case r of RInt -> x + 1 z :: Int z = inc RInt 3 \end{code} As an intermediate step, we write: \begin{code} type Rep' alpha = forall beta . ((alpha =*= Int) -> beta) -> beta mkRInt :: alpha =*= Int -> Rep' alpha mkRInt = \eq . \f . f eq inc :: Rep' alpha -> alpha -> Int inc = \r . \x . r (\(alpha =*= Int) . x + 1)) z :: Int z = inc (mkRInt (Int =*= Int)) 3 \end{code} Finally, expressed in \SystemFAI{}: \begin{code} -- |mkRInt| BigLam alpha . \(eq :: alpha =*= Int) . \(f :: forall beta . ((alpha =*= Int) -> beta) -> beta) . f alpha eq -- |inc| BigLam alpha . \(r :: forall beta . ((alpha =*= Int) -> beta) -> beta) . \(x :: alpha) . r Int (\(alpha =*= Int) . (+) x 1) -- |z| inc Int (mkRInt Int (Int =*= Int)) 3 \end{code} Note that the type of |x| in |inc| is |alpha|, and is required to have type |Int|. In System F, this expression is not correctly typed. It has a valid type in \SystemFAI's type system, which we discuss below. \subsection{Type Rules} \label{sect.typesystem.rules} The typing relation (rules in \Fig{fig.type.expr}) states that in environment |env|, the expression |e| has type |ty|. These rules are syntax directed, except for rule \TirName{coerce}. The idea is that the shape of the expression determines which rules to apply, and we resort to rule \TirName{coerce} when there is a mismatch between the types. We illustrate this process briefly with inference for the |inc| example of the previous section, then explain the rules in more detail. \begin{CenterFigure}{h}{Expression type rules}{fig.type.expr} \rule{\linewidth}{0.1pt} \begin{mathpar} \fbox{|env :- e : ty|}\\ \inferrule*[right=var] { |x :: tau `elem` env| \\ } { |env :- x : tau| } \inferrule*[right=coerce] { |env :- e : sigma|\\\\ |env ::- sigma =*= tau|\\\\ |ftv tau `subset` ftv env| } { |env :- e : tau| } \inferrule*[right=eq] { |env ::- tau =*= sigma| \\\\ |ftv tau `subset` ftv env| \\\\ |ftv sigma `subset` ftv env| } { |env :- tau =*= sigma : tau =*= sigma| } \inferrule*[right=app.expr] { |env :- f : sigma -> tau| \\\\ |env :- e : sigma| } { |env :- f e : tau| } \inferrule*[right=app.ty] { |env :- f : sigma -> tau| \\\\ |ftv sigma `subset` ftv env| } { |env :- f sigma : tau| } \inferrule*[right=lam.expr] { |env, x :: sigma :- e : ty| } { |env :- \(x :: sigma) . e : sigma -> ty| } \inferrule*[right=lam.ty] { |env, alpha :- e : ty| \\\\ |alpha notelem ftv env | } { |env :- BigLam alpha . e : forall alpha . ty| } \inferrule*[right=lam.eq] { |env, x :: rho =*= sigma :- e : tau|\\\\ |x notelem env| } { |env :- \(rho =*= sigma) . e : (rho =*= sigma) -> tau| } \end{mathpar} \rule{\linewidth}{0.1pt} \end{CenterFigure} \paragraph{Example.} To type the |inc| expression (see Figure~\ref{fig.deriv.example}), we apply first rule \TirName{lam.ty}, then rule \TirName{lam.expr} twice. The former administers the Skolem constant |alpha| in the environment, the second two the types of |r| and |x|. The type of |r|, we extract again using the rule \TirName{var}, then use rule \TirName{app.ty} to instantiate the universally quantified |beta| of the type of |r| to the result type |Int|. \begin{CenterFigure}{h}{Typing Derivation of |inc|}{fig.deriv.example} \rule{\linewidth}{0.1pt} \begin{mathpar} \inferrule* { \inferrule* { \inferrule* { \inferrule* { \inferrule* { \inferrule* {} {|Gamma1 :- r : forall beta . (((alpha =*= beta) -> beta) -> beta)|} } {|Gamma1 :- r Int : ((alpha =*= Int) -> Int) -> Int|} \\ \inferrule* {} {|Gamma1 :- \(alpha =*= Int) . (+) x 1 : ...|} } {|Gamma1 :- r Int (\(alpha =*= Int) . (+) x 1) : Int|} } {|... :- \x . r Int (\(alpha =*= Int) . (+) x 1) : ... -> Int|} } {|... :- \r . \x . r Int (\(alpha =*= Int) . (+) x 1) : ... -> ... -> Int|} } {|emptyset :- BigLam alpha . \r . \x . r Int (\(alpha =*= Int) . (+) x 1) : forall alpha . ... -> ... -> Int|} \inferrule* { \inferrule* { \inferrule* { \inferrule* {} {|Gamma2 :- (+) : ...|} \\ \inferrule* { |Gamma2 :- x : alpha| \\ |Gamma2 ::- alpha ~ Int| } {|Gamma2 :- x : Int|} } {|Gamma2 :- (+) x : Int -> Int|} \\ \inferrule* {} {|Gamma2 :- 1 : Int|} } {|Gamma2 :- (+) x 1 : Int|} } {|Gamma1 :- \(alpha =*= Int) . (+) x 1 : (alpha =*= Int) -> Int|} \end{mathpar} \begin{code} Gamma0 = emptyset, 1 :: Int, (+) :: Int -> Int -> Int Gamma1 = Gamma0, alpha, r :: forall beta . (((alpha =*= beta) -> beta) -> beta, x :: alpha Gamma2 = Gamma1, e :: alpha =*= beta \end{code} \rule{\linewidth}{0.1pt} \end{CenterFigure} To type the subexpression |(\(alpha =*= Int) . (+) x 1)|, we use rule \TirName{lam.eq} to introduce the equality into the environment, bound to a fresh name (not important for now). The subexpression |(+) x 1| must have type |Int|. Assuming that |(+) :: Int -> Int -> Int| is in the environment, this means that |x| must have type |Int|. However, the type we get for |x| by applying the \TirName{var}-rule is |alpha|. So, we apply the \TirName{coerce}-rule to convert the type |alpha| to |Int|. This requires us to prove that |env ::- alpha =*= Int|, for which we give the rules later. This is in this case not difficult, because exactly this equality we added just before into the environment. \paragraph{Type rules overview.} Most of the rules are vanilla System F rules. Via the |ftv| relation, we state that the types chosen in rules \TirName{coerce}, \TirName{eq}, and \TirName{app.ty} are closed (and well-formed). The |ftv| relation is defined as: \begin{code} ftv alpha = {alpha} ftv (tau -> sigma) = ftv tau `union` ftv sigma ftv (forall alpha . tau) = ftv tau - {alpha} ftv (tau =*= sigma) = ftv tau `union` ftv sigma \end{code} We overload |ftv| to work on environments as well: \begin{code} ftv emptyset = emptyset ftv (env, alpha) = ftv env `union` {alpha} ftv (env, x :: ty) = ftv env `union` ftv ty \end{code} Rules \TirName{coerce} and \TirName{eq} specify the construction of equality proofs. In the former, the equality proof is used to coerce a type, in the latter to pass it on as a first-class value. In rules \TirName{lam.expr}, \TirName{lam.ty}, and \TirName{lam.eq}, we extend the environment. Extension of an environment with a binding shadows a possible previous binding with the same name. In rule \TirName{lam.eq}, we introduce an equality in the environment, using a fresh name |x|. This name is of consequence for the next section, but has no meaning in this section. \begin{CenterFigure}{h}{Equality proof inference rules}{fig.type.equal} \rule{\linewidth}{0.1pt} \begin{mathpar} \fbox{|env ::- ty1 =*= ty2|}\\ \inferrule*[right=refl] {} {|env ::- ty =*= ty| } \inferrule*[right=sym] { |env ::- sigma =*= tau| } { |env ::- tau =*= sigma| } \inferrule*[right=trans] { |env ::- tau =*= rho| \\\\ |env ::- rho =*= sigma| } { |env ::- tau =*= sigma| } \inferrule*[right=assum] { |x :: tau =*= sigma `elem` env| } { |env ::- tau =*= sigma| } \inferrule*[right=con.arr] { |env ::- sigma =*= tau| \\\\ |env ::- rho =*= varrho| } { |env ::- tau -> rho =*= sigma -> varrho| } \inferrule*[right=con.univ.left] { |env, beta ::- tau [alpha := beta] =*= sigma|\\\\ |beta notelem ftv tau +++ ftv env| } { |env ::- forall alpha . tau =*= sigma| } \inferrule*[right=con.univ.right] { |env ::- tau =*= sigma [beta := alpha]| \\\\ |beta `elem` ftv env|\\\\ |alpha notelem ftv sigma| } { |env ::- tau =*= forall alpha . sigma| } \inferrule*[right=sub.univ] { |x :: (tau3 =*= (forall alpha . tau4)) `elem` env| \\\\ |y :: (tau5 =*= (forall alpha . tau6)) `elem` env| \\\\ |env ::- tau3 =*= tau5| \\\\ |x' :: tau4 =*= tau6 ::- tau1 =*= tau2| \\\\ |x' notelem env| } { |env ::- tau1 =*= tau2| } \inferrule*[right=sub.arr] { |x :: (tau3 =*= (tau4 -> tau5)) `elem` env| \\\\ |y :: (tau6 =*= (tau7 -> tau8)) `elem` env| \\\\ |env ::- tau3 =*= tau6| \\\\ |x' :: tau4 =*= tau7, y' :: tau5 =*= tau8 ::- tau1 =*= tau2| \\\\ |x',y' notelem env| } { |env ::- tau1 =*= tau2| } \end{mathpar} \rule{\linewidth}{0.1pt} \end{CenterFigure} \subsection{Type Conversions} \label{sect.typeconversions} Figure~\ref{fig.type.equal} lists the inference rules for equality proofs. The rules \TirName{refl}, \TirName{sym} and \TirName{trans} correspond to the conventional rules of an equality theory. An equality assumption can be applied through rule \TirName{assum}. In the example of the beginning of this section, we need to prove |env ::- Int =*= Int| (by means of the \TirName{refl}-rule), and |env ::- a =*= Int| (with the \TirName{assum}-rule). In Section~\ref{sect.typed.abstract} we showed some examples using more involved equality proofs. Simply put, proving an equality is a matter of exhaustively applying all the rules. To be able to apply coercions deeper into types, we have congruence and subsumption rules for each member of the type language that has a substructure. In our case, for arrows and universal quantification. We do not need nor allow coercions on equality proofs. With congruence rules, if two types share a common structure, we only need to prove the equality between the components that differ. The other way around, with subsumption rules, we lift a proof on smaller type into a proof on bigger types. In other specifications~\citet{DBLP:conf/icfp/SchrijversJSV09,sulzmann07}, the subsumption rules e.g. on arrows are typically specified as: \begin{mathpar} \inferrule*[right=sub.l] { |env ::- tau -> rho =*= sigma -> varrho| } { |env ::- tau =*= sigma| } \inferrule*[right=sub.r] { |env ::- tau -> rho =*= sigma -> varrho| } { |env ::- tau =*= sigma| } \end{mathpar} These rules appear simpler than the rule in our specification. However, we have objections against these rules: \begin{itemize} \item For any environment |env|, the other rules have a finite number of unique instantiations. However, when we include the above subsumption rule, arbitrary big types can be constructed. Thus the search space can be infinite. In practice, we only need to consider a finite portion of the search space to prove or disprove the equality of two types, but this rule does not force us to. \item The subsumption rule does not give us an intuition when to actually apply this rule. To construct an equality proof, one first decomposes the equality to prove using the congruence rules, then apply subsumption rules to work up to assumptions in the environment. We thus restrict the subsumption rule to assumptions in the environment. Our rule simply expresses that if there are two equalities in the environment, and these equalities can be shown equal on one side, then we may assume that each pairwise subtype on the other side is equal as well. \end{itemize} \subsection{Multiple Derivations} There may be more than one possible way to complete an inference. If there are multiple derivations possible, the equality relation has the nice property that any of them satisfies. Also, if one can be completed, then all of them can be completed. % Consequence: can immediately stop after one branch is found. % Consequence: if one way of completing the proof is to make a substitution, % and another way of completing the proof is to make another substitution, % then either both substitutions are needed or both are equivalent. \subsection{Algorithm} \label{sect.algorithm} The specification of this section has a straightforward implementation, even combined with type inference. We first infer types of a program in the conventional way, but for each type conflict, we generate a coercion constraint. At the end of type inference for e.g. a binding group, we try to solve these coercion constraints, using an exhaustive application of the above equality rules. For those constraints that cannot be solved we generate a type error. For those we can, we generate a coercion. In the next section, we show how to generate these coercions. % In later work, we show how to do this in a breadth-first way