\section{Introduction}
\label{sect.introduction}


Generalized Algebraic Data Types (\GADTs) allow for additional equalities to hold between types of a data constructor.
These equalities must hold when constructing a value with this data constructor. When a match against this constructor succeeds
in a pattern match, these equalities hold between the types of the constructor, and may be used to coerce the type of an
expression to an equivalent type.

This particular feature found many application areas.
Many Haskell-related projects make use of \GADTs, such as a diversity of generic programming approaches~\citep{jeuj09:AFP_2008}, and transformations on typed abstract syntax for the implementation of domain-specific languages~\citep{DBLP:conf/tldi/BaarsSV09}.
Therefore, we added support for \GADTs{} in UHC~\citep{uhc}, the Haskell compiler that we develop at Universiteit Utrecht.

\citet{dijkstra05phd} describes the implementation of UHC as a combination of many specifications. Similarly, we wanted to add GADTs
to UHC by taking a specification as guidance. However, it was not straightforward to match existing specifications
to UHC's infrastructure, for the following reasons:
\begin{itemize}
\item GADT pattern matching is defined in terms of case-expressions.
  In a programming language such as Haskell, there are several other ways to pattern match, such as in
  let-bindings and list-comprehensions. Furthermore, Haskell has a concept of lazy pattern matching
  (irrefutable patterns). It is not directly clear if GADT matching is allowed in these situations,
  and what the side conditions are.

\item For many approaches, it turns out
  that the effectiveness of inference for GADTs depends on what type information
  can be derived a priori from type annotations given by the programmer (Section~\ref{sect.relatedwork}).
  Specifications for inference of GADTs use a variety of techniques such as shape inference to describe that process.
  The implementation of such techniques crosscut the type inference implementation of the
  compiler, thus we are reluctant to incorporate them, unless absolutely necessary.
  Additionally, UHC already analyzes a program to derive type information from type signatures.
  Since the above techniques are intertwined with the specifications, it is not directly
  clear how to compare our infrastructure with the infrastructure used in the aforementioned specifications.
\end{itemize}

In this paper, we present a specification for GADT inference that addresses the above points, and
has the following distinguishing features:
\begin{itemize}
\item We introduce a variation of System F, named \SystemFAI{}, extended with equality assumptions.
  In contrast to other GADT specifications, \SystemFAI{} does not have syntax for data types or
  case expressions. These are redundant in our specification, because we exploit the folklore
  Church encoding of data types as conventional lambda expressions.
  The result is an abstract and concise description without the redundant syntax,
  while it is on the other hand more general, because it (necessarily) answers how to treat arbitrary
  pattern matching, including let-bindings and irrefutable patterns.
\item Our specification is orthogonal to type inference infrastructure.
  Like System F, \SystemFAI{} is explicitly typed. These explicit types
  represent the type information that can be derived using analysis on type signatures and
  conventional type inference.
  Our specification thus focusses in isolation on how to infer type coercions.
\end{itemize}
We thus claim that our specification is \emph{lean}: it is concise because it
does not require the notion of data types and case expressions, and abstract because
it is orthogonal to infrastructure (in particular regarding type annotation analysis).
However, our specification only specifies when explicit type information
is needed, not when type annotations may be omitted, which depends largely on the
effectiveness of the algorithms we abstracted from.

%We briefly discuss implementation

In an earlier version of this paper~\citep{mypaper}, we presented this work in terms of
an extension of \SystemFA~\citep{sulzmann07}, using explicit syntax for data types and
case expressions. The encoding in \SystemFAI{} is significantly less complex.

\paragraph{Roadmap.}
We introduce \GADTs{} in \Sect{sect.motivation.introduction}, and motivate
the design choices for our specification in Section~\ref{sect.designdecisions}.
We formally define \SystemFAI{} and its type system in \Sect{sect.typesystem}.
In \Sect{sect.translation}, we define the semantics of \SystemFAI{} via a translation to a subset of \SystemFC, which is a \SystemF-like language with explicit type coercions, defined by \citet{sulzmann07}.
The relation to various other approaches, we discuss in Section~\ref{sect.relatedwork}.
Finally, in Section~\ref{sect.conclusion}, we briefly mention our experiences with the
integration of GADTs in the UHC.

%% Finally, we give an inference algorithm for \SystemFAI{} in Section~\ref{sect.algorithm}.