Generalized Algebraic Data Types

Jake McArthur

GADTs

Generalized Algebraic Data Types extend algebraic data types with two new features:

existential types via GADTs

type 'a t =
| Apply : ('e -> 'a) t * 'e -> 'a t

GADT constructor definitions are meant to resemble function type signatures

'e is an existential type because the constructor hides it

let exp : int t =
  Apply (Value Int.of_string, "1337")

The 'e is actually string in exp

existential types via GADTs

type 'a t =
| Apply : ('e -> 'a) t * 'e -> 'a t
let rec run = function
| Apply (fun_t, arg) -> fun_t arg

Even though the compiler doesn't know what 'e is, it does know that it's the same in fun_t and val_t

The 'e is only in scope to the right of the Apply pattern

existentials may not escape

type 'a t =
| Apply : ('e -> 'a) t * 'e -> 'a t
let borked = function
| Apply (_, arg) -> arg
Error: This expression has type ex#0 but an
       expression was expected of type
         ex#0
       The type constructor ex#0 would escape
       its scope

how to let one "escape"

type 'a t =
| Apply : ('e -> 'a) t * 'e -> 'a t

type useless = Useless : _ -> useless

let unborked : _ t -> useless = function
| Apply (_, arg) -> Useless arg

Not really escaping since it's still a hidden type

What you do intend to do with the result?

universal vs. existential

universal types existential types
polymorphic values/functions, polymorphic fields first class modules, GADTs, somewhat complex encoding with nested polymorphic fields
unify with any other type unify with no other type
more restricted implementation less restricted implementation
less restricted interface more restricted interface

two extremes

type absurd  = { absurd : 'a. 'a }
type useless = Useless : _ -> useless

let absurd  a = a.absurd
let useless e = Useless e
val absurd  : absurd -> _
val useless : _      -> useless

If I have an absurd I can convert it to anything, but I can never create an absurd in the first place.

I can convert anything to a useless, but I can never inspect what I put inside.

type equality

Constrain the type's parameters at construction

The constraints are visible to the compiler during pattern matching

type equality

type _ t =
| Int    : int    -> int    t
| String : string -> string t

The "return type" of the constructor constrains the type parameter. We don't even have to name it on the left hand side.

(* fake syntax *)
type 'a t =
| Int    of {'a ~ int}    * int
| String of {'a ~ string} * string

My {'a ~ int} syntax is intended to represent a witness that 'a is the same type as int

The constraints are checked at construction

pattern matching with type equality

type _ t =
| Int    : int    -> int    t
| String : string -> string t
let extract (type a) (t : a t) : a =
  match t with
  | Int    n   -> n
  | String str -> str
(* fake syntax *)
let extract (type a) (t : a t) : a =
  match t with
  | Int    ({a ~ int},    n)   -> n
  | String ({a ~ string}, str) -> str

The unifier uses the constraints during pattern matching. My '{a ~ int}' syntax is meant to represent the hidden information available to the compiler informing it that in this context a is equal to int.

If we don't bind the type we are constraining then the type checker will think it should be the same type in all cases, which would lead to a type error because string doesn't match int.

exhaustiveness

type _ t =
| Int    : int    -> int    t
| String : string -> string t
let extract_string (t : string t) : string =
  match t with
  | String str -> str
(* fake syntax *)
let extract_string (t : string t) : string =
  match t with
  | String ({string ~ string}, str) -> str

The String case is accepted because the types match.

If we had attempted to match the Int constructor, it would have been a compiler error because it would have tried to unify int with string.

The compiler tries patterns that thinks might be missing, but if it finds that they would cause type errors, it does not complain.

both new features at once

type _ t =
| Int    : int         -> int       t
| String : string      -> string    t
| Pair   : 'b t * 'c t -> ('b * 'c) t
(* fake syntax *)
type 'a t =
| Int    of {'a ~ int}       * int
| String of {'a ~ string}    * string
| Pair   :  {'a ~ ('b * 'c)} * 'b t * 'c t -> 'a t

Type equality allows us to constrain existential types at construction.

When pattern matching, this gives the compiler extra information about the hidden types.

GADTs with recursion

type _ t =
| Int    : int         -> int       t
| String : string      -> string    t
| Pair   : 'a t * 'b t -> ('a * 'b) t
let rec extract (type a) (t : a t) : a =
  match t with
  | Int    n        -> n
  | String str      -> str
  | Pair   (t1, t2) -> extract t1, extract t2
Error: This expression has type ex#0 t but an
       expression was expected of type
         ex#0 t
       The type constructor ex#0 would escape
       its scope

An unfortunate error message when compiling this

The problem is that we need polymorphic recursion.

GADTs with recursion

type _ t =
| Int    : int         -> int       t
| String : string      -> string    t
| Pair   : 'a t * 'b t -> ('a * 'b) t
let rec extract : 'a. 'a t -> 'a =
  fun (type a) (x : a t) ->
    ((match x with
      | Int    n        -> n
      | String str      -> str
      | Pair   (t1, t2) -> extract t1, extract t2) : a)

This works, but it's very awkward. We have to use an explicit type annotation for the polymorphic recursion, and we have to use the newtype a, and we have to explicitly tell the compiler where it goes.

GADTs with recursion

type _ t =
| Int    : int         -> int       t
| String : string      -> string    t
| Pair   : 'a t * 'b t -> ('a * 'b) t
let rec extract : type a. a t -> a = function
| Int    n        -> n
| String str      -> str
| Pair   (t1, t2) -> extract t1, extract t2

A new syntax that combines polymorphic recursion and newtypes. Much nicer!

"essence" of type equality

type (_, _) equal =
| Refl : ('a, 'a) equal

We can sort of reify type equality with this type, giving us a value we can pass around that represents that two types are actually the same type.

coercion

type (_, _) equal =
| Refl : ('a, 'a) equal
let cast (type a) (type b)
         (Refl : (a, b) equal) (x : a) : b = x

All we have to do is pattern match to expose that the types are the same.

equivalent types

type (_, _) equal =
| Refl : ('a, 'a) equal
type _ t =
| Int    : int    -> int    t
| String : string -> string t
type 'a t =
| Int    of ('a, int)    equal * int
| String of ('a, string) equal * string

These two definitions of t are roughly equivalent.

This is a form of the fake syntax I used earlier, but the compiler actually accepts it.

Just note that there is a runtime cost this time since there is an actual value representing the equality.

pattern matching with equal

type (_, _) equal =
| Refl : ('a, 'a) equal
type 'a t =
| Int    of ('a, int)    equal * int
| String of ('a, string) equal * string

let extract (type a) (t : a t) : a =
  match t with
  | Int    (Refl, n)   -> n
  | String (Refl, str) -> str

Note that t is a normal ADT this time. All the GADTness is in equal.

Same principle as before. We just pattern match on Refl to expose its type information.

when GADTs are useful

type variables on only the right hand side

internal data structure invariants

smarter exhaustiveness check, fewer uses of assert false

data type focused, so best for libraries and DSLs

example: safer linked lists

type empty
type nonempty

type (_, _) safe_list =
| Nil  : ( _, empty) safe_list
| Cons : 'a * ('a, _) safe_list -> ('a, nonempty) safe_list

We can use a phantom type to capture the "emptiness" of a list in the types.

We could have done this already, but we would have had to make the representation abstract with a carefully written implementation to preserve the invariants.

example: safer linked lists

type (_, _) safe_list =
| Nil  : ( _, empty) safe_list
| Cons : 'a * ('a, _) safe_list -> ('a, nonempty) safe_list
let hd : ('a, nonempty) safe_list -> 'a = function
| Cons (x, _) -> x

No hd_exn or option return type necessary!

example: safer linked lists

val hd : ('a, nonempty) safe_list -> 'a
let print_head : 'a. (int, 'a) safe_list -> unit
  = fun xs -> printf "%d\n" (hd xs)
Error: This definition has type
         (int, nonempty) safe_list -> unit
       which is less general than
         'a. (int, 'a) safe_list -> unit

We get a type error if the list could be empty.

example: safer linked lists

val hd : ('a, nonempty) safe_list -> 'a
let print_head xs = printf "%d\n" (hd xs)
val print_head : (int, nonempty) safe_list -> unit = <fun>

infers a reasonable type on its own

example: safer linked lists

let nonempty
  :  type e.
     ('a, e) safe_list
  -> ('a, nonempty) safe_list option =
  function
  | Nil               -> None
  | Cons (_, _) as xs -> Some xs

If you have a list whose emptiness is not statically known, you can perform the same runtime check as you would with List.is_empty, but this time you can just return the same list with a more precise type.

example: safer linked lists

val hd : ('a, nonempty) safe_list -> 'a
val nonempty :  'e.
                ('a, 'e) safe_list
             -> ('a, nonempty) safe_list option
let print_head xs =
  match nonempty xs with
  | None    -> printf "list is empty\n"
  | Some xs -> printf "%d\n" (hd xs)
val print_head : (int, 'a) safe_list -> unit = <fun>

infers a reasonable type on its own

example: safer linked lists

type (_, _) safe_list =
| Nil  : ( _, empty) safe_list
| Cons : 'a * ('a, _) safe_list -> ('a, nonempty) safe_list
let is_nonempty
  :  type e.
     (_, e) safe_list
  -> (e, nonempty) equal option
  = function
  | Nil         -> None
  | Cons (_, _) -> Some Refl

Instead of returning the list with a different type, we can return a proof that the original list is nonempty. Now we don't have to rebind the list!

example: safer linked lists

val hd : ('a, nonempty) safe_list -> 'a
val is_nonempty : (_, 'e) safe_list -> ('e, nonempty) equal option
let print_head : type e. (int, e) safe_list -> unit = fun xs ->
  match is_nonempty xs with
  | None      -> printf "list is empty\n"
  | Some Refl -> printf "%d\n" (hd xs)
val print_head : (int, 'a) safe_list -> unit = <fun>

Requires us to specify the type signature. If we don't, it still compiles, but it will infer that the list must be nonempty, which is less useful. This is because it expects the type to be the same in both cases of the match statement by default.

example: typerep-mini

type _ rep =
| Int    :                    int       rep
| String :                    string    rep
| Tuple2 : 'a rep * 'b rep -> ('a * 'b) rep

Typerep is a library for reifying OCaml types as values so you can do some kinds of generic programming over them.

(This type only scratches the surface of what typerep really is.)

Every distinct value has a unique type (this is an example of a "singleton type").

example: typerep-mini

let rec generic_sexp_of : type t. t rep -> (t -> Sexp.t) =
function
| Int    -> fun n   -> Sexp.Atom (Int.to_string n)
| String -> fun str -> Sexp.Atom str
| Tuple2 (rep1, rep2) ->
  let f = generic_sexp_of rep1 in
  let g = generic_sexp_of rep2 in
  fun (x, y) -> Sexp.List [f x; g y]

(* let sexp_of_int_string_pair = <:sexp_of<int*string>> *)
let sexp_of_int_string_pair = generic_sexp_of_t (Tuple2 (Int, String))
val sexp_of_int_string_pair : int * string -> Sexp.t = <fun>

Has some expressivity that was previously only available via Camlp4.

example: typerep-mini

let rec generic_of_sexp : type t. t rep -> (Sexp.t -> t) = function
| Int ->
  (function
   | Sexp.Atom str -> Int.of_string str
   | Sexp.List _   -> failwith "failed to parse int from sexp")
| String ->
  (function
   | Sexp.Atom str -> str
   | Sexp.List _   -> failwith "failed to parse string from sexp")
| Tuple2 (rep1, rep2) ->
  let f = generic_of_sexp rep1 in
  let g = generic_of_sexp rep2 in
  (function
   | Sexp.List [sexp1; sexp2] -> f sexp1, g sexp2
   | _ -> failwith "failed to parse tuple from sexp")

Yes, we can go the other direction, too.

interpreters

Interpreters are probably the best possible fit for GADTs of all applications I can think of.

Structuring a library as an interpreter can make your life much easier.

statically type check some properties that you might have to dynamically check otherwise

alternative interpreters for testing purposes

more conveniently optimize the program before interpreting it (although it may not outweigh the interpretive overhead)

example: simple DSL

type _ t =
| Read_line  : string Reader.Read_result.t t
| Write_line : string -> unit t
| Return     : 'a -> 'a t
| Bind       : 'a t * ('a -> 'b t) -> 'b t

let read_line      = Read_line
let write_line str = Write_line str
let return x       = Return x
let bind t f       = Bind (t, f)

a little DSL for reading lines from one file and writing them to another

example: simple DSL

type _ t =
| Read_line  : string Reader.Read_result.t t
| Write_line : string -> unit t
| Return     : 'a -> 'a t
| Bind       : 'a t * ('a -> 'b t) -> 'b t
let rec run t read_file write_file =
  let open Async.Std in
  Reader.open_file read_file  >>= fun reader ->
  Writer.open_file write_file >>= fun writer ->
  let rec loop : type a. a t -> a Deferred.t = function
    | Read_line      -> Reader.read_line reader
    | Write_line str -> Writer.write_line str
    | Return x       -> return x
    | Bind (t, f)    -> loop t >>= fun x -> loop (f x)
  in
  loop t

example: simple DSL

type 'a t
include Monad.S with type t := t
val read_line  : string Reader.Read_result.t t
val write_line : string -> unit t
let number_lines =
  let rec loop n =
    read_line >>= function
    | `Eof -> return ()
    | `Ok ln ->
      write_line (sprintf "%d: %s" n ln) >>= fun () ->
      loop (n+1)
  in
  loop 1

use it like a normal library

example: simple DSL

let optimize t =
  let writes_rev = ref [] in
  let rec do_pass : type a. a t -> a t = function
    | Bind (t, k) ->
      begin
        match do_pass t with
        | Return x -> do_pass (k x)
        | Write_line str -> writes_rev := str :: !writes_rev; do_pass (k ())
        | t -> Bind (t, k)
      end
    | Write_line str -> writes_rev := str :: !writes_rev; Return ()
    | t -> t
  in
  let t = do_pass t in
  match !writes_rev with
  | [] -> t
  | _ ->
    let first_write = StringLabels.concat (List.rev !writes_rev) ~sep:"\n" in
    Bind (Write_line first_write, fun () -> t)
# optimize (write_line "foo" >>= fun () ->
            write_line "bar" >>= fun () ->
            number_lines);;
- : unit t = Bind (Write_line "foo\nbar", <fun>)

Of course, the expression you are optimizing should be purely functional, since this optimization does partial evaluation.

example: simple DSL

let run_on_input t ins =
  let rec loop
    :  type a.
       string list
    -> a t * string list
    -> a * string list * string list
    = fun outs -> function
      | Return x      , ins       -> x     , outs       , ins
      | Read_line     , []        -> `Eof  , outs       , []
      | Read_line     , (ln::ins) -> `Ok ln, outs       , ins
      | Write_line out, ins       -> ()    , (out::outs), ins
      | Bind (t, k)   , ins       ->
        let x, outs, ins = loop outs (t, ins) in
        loop outs (k x, ins)
  in
  let r, outs, ins = loop [] (t, ins) in
  r, List.rev outs, ins

run it on a static input without involving I/O

example: simple DSL

TEST "number_lines" =
  run_on_input number_lines ["foo";"bar";"baz"]
  = ((), ["1: foo"; "2: bar"; "3: baz"], [])

unit test code intended for Async without using Async (or any I/O at all) in the tests

verifies the return value, the output, and any remaining input

The Done

Questions?

Exercise 1: Safe tail

Come up with an appropriate type signature and implement a safe tl function for safe_list.

Exercise 2: Type equality

Implement a module with the following signature. All operations should take constant time.

type (_, _) t =
| Refl : ('a, 'a) t

val symmetric  : ('a, 'b) t -> ('b, 'a) t
val transitive : ('a, 'b) t -> ('b, 'c) t -> ('a, 'c) t

val cast      : ('a, 'b) t -> 'a      -> 'b
val cast_list : ('a, 'b) t -> 'a list -> 'b list

Exercise 3: List to safe list

Come up with a way to convert a normal list to a safe_list.

Exercise 4: Lambda calculus interpreter

Implement a module with the following signature:

type 'a t =
| Primitive of 'a
| Lambda : ('a t -> 'b  t) -> ('a   -> 'b) t
| Apply  : ('a   -> 'b) t  *   'a t -> 'b  t

val eval : 'a t -> 'a

Exercise 5: Impossibility

Remember that absurd type from earlier (reproduced below)? We can sometimes use it to prove that it is impossible to create values of certain types. Among other things, this gives us a way to prove that a GADT's design preserves some of the invariants we expect.

type absurd = { absurd : 'a. 'a }

(* In the absense of side effects like exceptions or nontermination,
   a value of type ['a impossible] is a proof that there are no values
   of type ['a]. *)
type 'a impossible = 'a -> absurd

Construct a proof that it is impossible convert a list whose type claims it is empty into a list whose type claims it is nonempty. Also, write a function that, given a proof that a safe_list is "not empty" (as opposed to "nonempty") converts the safe_list with unknown emptiness into a nonempty safe_list.

type (_, _) equal =
| Refl : ('a, 'a) equal

type empty
type nonempty

type (_, _) safe_list =
| Nil  : ( _, empty) safe_list
| Cons : 'a * ('a, _) safe_list -> ('a, nonempty) safe_list

val hd : ('a, nonempty) safe_list -> 'a

type coerce =
  { coerce : 'a. ('a, empty) safe_list -> ('a, nonempty) safe_list }
val empty_list_stays_empty : coerce impossible

val not_empty_list
  :  ('e, empty) equal impossible
  -> ('a, 'e) safe_list
  -> ('a, nonempty) safe_list

Aside: If you manage to create a value of type unit impossible without relying on any side effects, you have successfully broken OCaml's type system, since that would allow you to implement Obj.magic.

Exercise 6: Static-length lists

There are two GADTs involved in this exercise. There is a bit of extra knowledge that may be required to make this work that I did not cover in the talk. When pattern matching on two GADTs at once, the type constraints from the first can constrain the types of the second, but not the other way around. That means that if you tuple two GADTs into a single pattern, the ordering of the tuple might matter.

Implement a module with the following signature. The definitions for t and index are ommitted since their design is part of the exercise, but their representations must be safe to reveal in the signature.

(* [zero] and [succ] are intended for constructing type-level natural
   numbers. For example, [zero succ succ succ succ succ] represents
   the number 5. *)
type zero
type 'a succ

(* A [('a, 'n) t] is a linked list containing elements of type ['a] and
   having a length of exactly ['n] *)
type ('a, n) t (* = <definition here> *)

val hd : ('a, 'n succ) t -> 'a
val tl : ('a, 'n succ) t -> ('a, 'n) t

val zip : ('a, n) t -> ('b, 'n) t -> ('a * 'b, 'n) t

(* A ['n index] is a valid index for any list of length ['n]. *)
type 'n index (* = <definition here> *)

val nth : ('a, 'n) t -> 'n index -> 'a

Hint: [index]'s representation will probably take linear space. If you manage to come up with a smaller representation that is still type safe by construction, you should write a paper.