\section{Introduction} \label{sect.introduction} Generalized Algebraic Data Types (\GADTs) allow for additional equalities to hold between types of a data constructor. These equalities must hold when constructing a value with this data constructor. When a match against this constructor succeeds in a pattern match, these equalities hold between the types of the constructor, and may be used to coerce the type of an expression to an equivalent type. This particular feature found many application areas. Many Haskell-related projects make use of \GADTs, such as a diversity of generic programming approaches~\citep{jeuj09:AFP_2008}, and transformations on typed abstract syntax for the implementation of domain-specific languages~\citep{DBLP:conf/tldi/BaarsSV09}. Therefore, we added support for \GADTs{} in UHC~\citep{uhc}, the Haskell compiler that we develop at Universiteit Utrecht. \citet{dijkstra05phd} describes the implementation of UHC as a combination of many specifications. Similarly, we wanted to add GADTs to UHC by taking a specification as guidance. However, it was not straightforward to match existing specifications to UHC's infrastructure, for the following reasons: \begin{itemize} \item GADT pattern matching is defined in terms of case-expressions. In a programming language such as Haskell, there are several other ways to pattern match, such as in let-bindings and list-comprehensions. Furthermore, Haskell has a concept of lazy pattern matching (irrefutable patterns). It is not directly clear if GADT matching is allowed in these situations, and what the side conditions are. \item For many approaches, it turns out that the effectiveness of inference for GADTs depends on what type information can be derived a priori from type annotations given by the programmer (Section~\ref{sect.relatedwork}). Specifications for inference of GADTs use a variety of techniques such as shape inference to describe that process. The implementation of such techniques crosscut the type inference implementation of the compiler, thus we are reluctant to incorporate them, unless absolutely necessary. Additionally, UHC already analyzes a program to derive type information from type signatures. Since the above techniques are intertwined with the specifications, it is not directly clear how to compare our infrastructure with the infrastructure used in the aforementioned specifications. \end{itemize} In this paper, we present a specification for GADT inference that addresses the above points, and has the following distinguishing features: \begin{itemize} \item We introduce a variation of System F, named \SystemFAI{}, extended with equality assumptions. In contrast to other GADT specifications, \SystemFAI{} does not have syntax for data types or case expressions. These are redundant in our specification, because we exploit the folklore Church encoding of data types as conventional lambda expressions. The result is an abstract and concise description without the redundant syntax, while it is on the other hand more general, because it (necessarily) answers how to treat arbitrary pattern matching, including let-bindings and irrefutable patterns. \item Our specification is orthogonal to type inference infrastructure. Like System F, \SystemFAI{} is explicitly typed. These explicit types represent the type information that can be derived using analysis on type signatures and conventional type inference. Our specification thus focusses in isolation on how to infer type coercions. \end{itemize} We thus claim that our specification is \emph{lean}: it is concise because it does not require the notion of data types and case expressions, and abstract because it is orthogonal to infrastructure (in particular regarding type annotation analysis). However, our specification only specifies when explicit type information is needed, not when type annotations may be omitted, which depends largely on the effectiveness of the algorithms we abstracted from. %We briefly discuss implementation In an earlier version of this paper~\citep{mypaper}, we presented this work in terms of an extension of \SystemFA~\citep{sulzmann07}, using explicit syntax for data types and case expressions. The encoding in \SystemFAI{} is significantly less complex. \paragraph{Roadmap.} We introduce \GADTs{} in \Sect{sect.motivation.introduction}, and motivate the design choices for our specification in Section~\ref{sect.designdecisions}. We formally define \SystemFAI{} and its type system in \Sect{sect.typesystem}. In \Sect{sect.translation}, we define the semantics of \SystemFAI{} via a translation to a subset of \SystemFC, which is a \SystemF-like language with explicit type coercions, defined by \citet{sulzmann07}. The relation to various other approaches, we discuss in Section~\ref{sect.relatedwork}. Finally, in Section~\ref{sect.conclusion}, we briefly mention our experiences with the integration of GADTs in the UHC. %% Finally, we give an inference algorithm for \SystemFAI{} in Section~\ref{sect.algorithm}.