ML programmer's guide to ATS

In this guide, we don't aim to teach you programming in ATS (Applied Type System), but we need to tell you enough so that a seasoned ML programmer can quickly adjust to the terminology used in ATS and start reading existing ATS code.

Wonderful world of ATS

In ATS, it is useful to keep in mind that you will be programming in three worlds.

    • Dynamics, the part that evaluates when you run the program. This is what programmers are most familiar with dealing with programming languages.
    • Proofs, relates dynamic property with static property. Proofs are considered a dynamic part of the program but is stripped away after the compilation, so they are absent in run-time. The compiler checks your proof and then erases them before emitting code.
    • Statics, the part evaluated by the compiler as part of type checking. The static part is a restricted form of programming so that evaluation of static expressions will always terminate, so type checking is decidable.

In ATS predecessor Dependent ML, the static world only sees a very limited presence of the dynamic world through dependent types. Each dynamic expression has a type that can be indexed with a static expression. Static constraints are collected from static expressions, and type checking involves solving these constraints to decide satisfiability.

In addition to dependent types, ATS also has proofs, which allow the static world to become more involved in dynamics by allowing programmers to write a dynamic expression and a proof term side-by-side to establish correspondence. Proofs can either be classical propositions, called props, or linear propositions (in the sense of linear logic—see margin note) called views.

To those familiar with Prolog, defining props is very much like defining a Prolog predicate. However, the type checker doesn't solve these predicates automatically, and you have to manually manipulate proofs like how you manipulate program data. Proof manipulation allows ATS to be used as a theorem prover.Finally, you can draw the dichotomy of dynamics and statics as follows, in the form of "some kind of dynamic terms are characterized by a particular static sort."

    • Program terms are characterized by types.
    • Linear proof terms are characterized by views.
    • Propositional proof terms are characterized by props.

Sufficient suffixes

In ATS, files can have several suffixes.

    • .dats contains dynamic and static declarations, analogous to a module structure in SML, or .ml in O'Caml.
    • .sats contains static declarations, analogous to a module signature in SML, or .mli in O'Caml.
    • .cats contains C code used by a particular .sats file.
    • .hats contains either dynamic or static ATS code that is included by either .sats or .dats.

One notable difference between .dats and .sats is the declaration of function and value signatures.

About Linear Logic

Linear logic distincts itself from classical logic in the sense that weakening and contraction structural rules are disallowed. Weakening says you cannot be given a fact and leave it unused, and contraction says you cannot designate a fact for multiple uses. Therefore, a fact must be used exactly once.

Views are linear in that sense, and view proofs are typically used to track resource ownership or to update states of a resource.

.dats

extern val c: int
extern fun foo (x:int): int
fun bar (x: int): int = ...

.sats

val c: int
fun foo (x:int): int
(* not possible *)

However, more typically, .sats would define the type of a function to be implemented in .dats. In such case, the implementation in .dats does not need type annotation.

.dats

implement baz (x) = ...

.sats

fun baz (x:int): int

It is important to understand that, unlike ML, ATS type checker only performs very limited type inference. You have to explicitly annotate enough types (mostly arguments to a function) so ATS can reconstruct typing derivation.

Ecstatic statics

ATS is a bit like ML. You can define type and datatype.

ATS

typedef t = int
typedef pair (a:type, b:type) = '(a, b)
datatype option (a:type) =
  | Some(a) of a
  | None(a)

SML

type t = int
type ('a, 'b) pair = 'a * 'b
datatype 'a option =
    Some of 'a
  | None

From the SML code, there is a predefined type int, and here we define a new type name t aliased to int. We can define aliases to types parameterized by some type variables. We also define an option datatype that is parameterized by a type variable a. These identifiers are all static identifiers, in contrast to dynamic identifiers which are run-time variables.

In ATS, notice that type variables for datatype constructors need to be explicitly applied. For more information about datatype, see Datatypes in the ATS tutorial, but you may want to read the rest of this section first.

Type definition assigns a static identifier to a static expression with the sort "type", so typedef is a special case for stadef. The following two static declarations are almost the same.

ATS (1)

(* sta t:type *)
typedef t = int
typedef pair (a:type, b:type) =
  '(a, b)

ATS (2)

(* sta t:_ *)
stadef t = int
stadef pair (a:type, b:type) =
  '(a, b)

The only difference is that typedef ensures that t will end up having the "type" sort, and stadef doesn't particularly care.

A type is one of the many sorts that a static expression or static identifier can be. Other sorts are:

    • Ground sorts: int, bool, char, addr. They form the basis of dependent type indices.
    • View, which are linear (in the sense of linear logic) propositions.
    • Prop, which are classical propositions.
    • Viewtype, which are linear types. It can be seen as a type combined with a view.

You can define a new sort using datasort. The first few lines of sllist.dats demonstrate how to define a sort that resembles a list. You can also alias sorts using sortdef, which also lets you piggy-back constraints to a sort called subset sort. See prelude/sortdef.sats for examples. A constraint is simply a static expression of a boolean sort.

For each of the type, view, prop, it is possible to alias, declare abstract quantities, and form algebraic constructors.

Type, view, prop, and viewtype are special cases of statics, so whenever you see typedef, viewdef, propdef and viewtypedef, you can just use stadef. Similarly, for abstract declarations, you can use sta for abstype, absview, absprop, and absviewtype. However, datasort, datatype, dataview, dataprop and dataviewtype are not interchangeable. They define algebraic constructors for a new "thing" (e.g. datatype gives you constructors for a new type, dataview gives you constructors for a new view, and so on).

View, prop, and viewtype will not be discussed now. Here we focus on types and simple static expressions.Static expression is a first-order language that is sort checked. Here are some examples.

    • When numbers are written as part of a static expression, it is a static integer of the sort "int".
    • There are built-in integer comparison predicates <, <=, >, >=, ==, <> of the static signature "(int, int) -> bool".
    • There are built-in boolean conjunction and disjunction &&, || of the static signature "(bool, bool) -> bool".
    • This is a static expression of the "bool" sort: 0 <= 4 && 4 < 10.
    • In general, we can define a static function nat_lt as follows:

FYI

More information about...

... can be found in the ATS tutorial.

ATS

stadef nat_lt (i: int, limit: int) =
  0 <= i && i < limit

The static function nat_lt can be used whenever a bool sort is required. For example, if we want to constraint the range of a run-time integer argument:

    • The built-in static identifier int is overloaded. It has two definitions, one of the sort "type", and one of the sort "int -> type". The second version is a dependent type integer, indexed by a static expression.
    • This is a static expression of the "type" sort: int(3).
    • We can restrict the integer type to a certain domain using existential quantifier with guard.

ATS

typedef NatLt (n: int) =
  [i: int | nat_lt (i, n)] int(i)
    • Which defines a static function NatLt: int -> type. Given a static integer n, NatLt gives you a type that is a subset of integers from 0 to n - 1. The body of NatLt reads, "there exists i, which is of the int sort and that nat_lt(i, n) is true, yields a type int(i)."

We can then use the type NatLt(n) to write a function:

ATS

fun digit_of_num (i: NatLt(10)): char = ...

You can look at prelude/basics_sta.sats, which is part of ATS/Anariats, for the signatures of many built-in static operators. Notice that stadef allows many such operators to be overloaded. The fixity of many infix binary operators are defined in prelude/fixity.sats.

Atypical compound types

The first distinction is type with a fixed size, denoted simply as type, and type with an arbitrary but unknown size, denoted as t@ype. Values of type are like ML values, which has the same size as a machine pointer word. It can either be boxed or unboxed. Boxed values of type also resides on the managed (garbage collected) heap. Here are some examples of boxed values:

However, the ML types are too limited for system programming, which has to represent memory layout as a flat structure. This is where variables of t@ype would be useful. More often, memory layout of this form carries a view (linear resource) and a t@ype, and they are called viewt@ype altogether. Notice that t@ype is a subsort of viewt@ype, so anywhere viewt@ype is mentioned, we can give it just a t@ype. In fact, there is a static function sizeof(t) that has the same meaning in the C language, which has the static signature:

ATS

sta sizeof: viewt@ype -> int

And the sizeof static function can be used for any t@ype and viewt@ype.

The boxed tuple, record, and list have analogy in the flat types as well.

But constructing flat values is a bit different. A flat value can reside either on the heap (like a boxed value) or on stack (which ML doesn't use for allocating data structure). This is discussed in another article, Manipulating flat memory.

Flat memory may be uninitialized. The notation "t?" represents the type of an uninitialized memory that has the same size as t. For example, uninitialized int is written as "int?" (see margin note). However, given two types t1 and t2, even though we may know that sizeof(t1) == sizeof(t2), the compiler still treats t1? and t2? as two different types.Finally, we occasionally see annotated '+' and '-' at the end of type (as well as viewtype) and t@ype (as well as viewt@ype). The '+' means that its parameters are covariant; '-' means the parameters are contravariant. Without the annotation, parameters are nonvariant. This is useful for subsorting for types indexed by a static expression. The covariant use is more common. Contravariance is rarely used.

Funky functions

The anatomy of an ATS function type deserves careful examination. The simplest form, similar to ML, is simply:

Int type is flat

It may come as a surprise to the reader, but int is a t@ype in ATS. Consider the C type int and long on a 64-bit machine: int is 32-bits, long is 64-bits, and pointers are 64-bits. Int does not have the same size as pointers. In the same way, long is also a t@ype in ATS because on a 16-bit machine, int is 16-bits, long is 32-bits, and pointers are 16-bits.

ATS (.sats)

fun function_name (x1: t1, ..., xn: tn): t' =
  "c_name"

SML (signature)

val function_name: (t1 * ... * tn) -> t'

Notice we can tell ATS that function_name will be called c_name when translated to C, though this is optional. This can be used to either:

    1. Import an existing C function to ATS and decorate it with an ATS type, or
    2. Export an ATS-implemented function to C with that name.

This has several important consequences.

    • ATS function type can be mapped directly to C function type.
    • ATS functions are always uncurried just like C functions.
    • By default, functions cannot be used as closure, nor is partial application supported, unless specifically tagged as such (we will see tags below).

Adding guarded and assertion dependent types to the mix, we now have:

ATS (.sats)

fun function_name
    {forall quantifiers | guards} (x1: t1, ..., xn: tn):
    [exists quantifiers | asserts] t' =
  "c_name"

Where guards are static expressions of the bool sort written in terms of any forall quantifiers, and asserts are static expressions of the bool sort written in terms of any forall and exists quantifiers. Furthermore, t1 ... tn are static expressions of the type sort that can be written in terms of the for-all quantifiers, and t' is a static expression of the type sort that can be written in terms of any forall and exists quantifiers.

Now, we add proof terms to the mix too.

ATS (.sats)

fun function_name
    {forall quantifiers | guards} (pf1: prop1, ..., pfp: propp | x1: t1, ..., xn: tn):
    [exists quantifiers | asserts] (prop'1, ..., prop'q | t') =
  "c_name"

Notice that the uncurried arguments are now separated into two parts, proof variables and program variables. Proof variables will be erased once compilation finishes, leaving just the program variables in the resulting compiled program.

Finally, ATS uses decorated arrow type to keep track of side-effects as tags.

ATS (.sats)

fun function_name
    {forall quantifiers | guards} (pf1: prop1, ..., pfp: propp | x1: t1, ..., xn: tn)
  :<tag1, ..., tagk>
    [exists quantifiers | asserts] (prop'1, ..., prop'q | t') =
  "c_name"

If we were to write function type as a static expression (as opposed to function signature declaration like we've seen so far), the type expression would look like this:

ATS (static expression)

{forall...} (prop1, ..., propp | t1, ..., tn)
  -<tag1, ..., tagk> [exists...] (prop'1, ..., prop'q | t')

Tagged Arrow Type

The arrow type in ATS looks like "-<>", and the form -<tag1, ..., tagk> is an augmented arrow type with decoration. ATS currently recognizes the following tags for distinguishing various kind of functions:

    • prf: proof function (will be erased after type checking).
    • lin: a function that has to be called exactly once.
    • fun: an ordinary function (default kind).
    • cloptr: a closure that can be explicitly freed.
    • cloref: a closure that is garbage collected.

As well as the following tags to track side-effects:

    • exn: raises exceptions.
    • ntm: possible non-termination.
    • ref: sharing global memory reference.
    • 0: no effects (pure).
    • 1: all effects, including everything we don't know about yet.

In addition to the tags that distinguish function kinds, we can attach the 0 and 1 suffixes as a shorthand to indicate the presence or absence of effects. For example, cloref1 is a common tag for describing ML-style closure that is garbage collected and can have any side-effect (e.g. raises exceptions).

An example from stdio.h

What does a typical ATS function declaration look like? Here we show an example from the Standard C library that sports the following prototype.

char *fgets(char *s, int size, FILE *stream);

From the man page, "fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. A '\0' is stored after the last character in the buffer." And that "fgets() returns s on success, and NULL on error or when end of file occurs while no characters have been read."

We now look at the ATS declaration, heavily commented and colored (orange is comment, cyan is keyword, and blue is static expression).

(* fgets_v describes the result of fgets, which can indicate either a
 * success or failure.
 *)
dataview fgets_v (sz:int, addr, addr) =
  | (* If success, location l is now a string buffer of size sz storing
     * a string of length n.
     *)
    {n:nat | n < sz} {l:addr | l <> null}
      fgets_v_succ(sz, l, l) of strbuf(sz, n) @ l
  | (* If failure, location l is still just sz number of bytes. *)
    {l:addr}
      fgets_v_fail(sz, l, null) of bytes(sz) @ l
fun fgets
    (* count is the number of bytes we want to read, and it has to fit
     * within sz, the size of buffer.
     *)
    {count, sz:int | 0 < count; count <= sz}
    (* fm is the static sort for file mode. *)
    {m:fm}
    (* l is for dependent type indexing for address of buffer. *)
    {l:addr}
    ( (* Proof that file mode implies read access. *)
      pf_mod: file_mode_lte(m, r),
      (* Linear proof that sz number of bytes is at l. *)
      pf_buf: bytes(sz) @ l
    | (* Argument, pointer indexed by location l. *)
      s: ptr(l),
      (* Argument, read at most count bytes. *)
      size: int(count),
      (* Argument, call by reference to FILE structure opened with
       * mode m.
       *)
      stream: &FILE(m)
    )
  :<>
    (* l' is the index for the return pointer. *)
    [l':addr]
    ( (* pf_buf is consumed, and this determines what it turns into
       * according to l'.
       *)
      fgets_v(sz, l, l')
    | (* Return pointer indexed by l'. *)
      ptr(l')
    ) =
  "fgets"

This example is based on libc/SATS/stdio.sats, part of ATS/Anariats, with slight modification and heavily annotated with comments. It may not be immediately apparent to the first-time reader, but there are some subtle points:

    • We allow the actual buffer size (here denoted by the static variable sz) to be larger than what we tell fgets() to use (here denoted by the static variable count). This is the way fgets is formulated in libc/SATS/stdio.sats, but arguably it is not very common to tell fgets() not to use all of the buffer.
    • If fgets() returns non-null, it must return the same pointer as before, but that pointer now should be interpreted as a strbuf, a string buffer holding zero terminated string of some length. Otherwise, the same view, bytes(sz) @ l, is given back. The only way to tell which case is the result of fgets() is by checking the pointer against null using an if-expression. It's a type error if you forget to check. If error checking gets tedius, libc/SATS/stdio.sats also has a version that raises an exception instead, called fgets_exn().

For your reference, this is what the above code looks like without comments.

dataview fgets_v (sz:int, addr, addr) =
  | {n:nat | n < sz} {l:addr | l <> null}
      fgets_v_succ(sz, l, l) of strbuf(sz, n) @ l
  | {l:addr}
      fgets_v_fail(sz, l, null) of bytes(sz) @ l
fun fgets
    {n, sz:int | 0 < n; n <= sz} {m:fm} {l:addr} (
      pf_mod: file_mode_lte(m, r), pf_buf: bytes(sz) @ l
    | s: ptr(l), size: int(n), stream: &FILE(m))
  :<>
    [l':addr] (fgets_v(sz, l, l') | ptr(l')) =
  "fgets"

TODO

    • fn (non-recursive) vs fun (recursive) and termination metrics.