In this guide, we don't aim to teach you programming in ATS (Applied Type System), but we need to tell you enough so that a seasoned ML programmer can quickly adjust to the terminology used in ATS and start reading existing ATS code.
In ATS, it is useful to keep in mind that you will be programming in three worlds.
In ATS predecessor Dependent ML, the static world only sees a very limited presence of the dynamic world through dependent types. Each dynamic expression has a type that can be indexed with a static expression. Static constraints are collected from static expressions, and type checking involves solving these constraints to decide satisfiability.
In addition to dependent types, ATS also has proofs, which allow the static world to become more involved in dynamics by allowing programmers to write a dynamic expression and a proof term side-by-side to establish correspondence. Proofs can either be classical propositions, called props, or linear propositions (in the sense of linear logic—see margin note) called views.
To those familiar with Prolog, defining props is very much like defining a Prolog predicate. However, the type checker doesn't solve these predicates automatically, and you have to manually manipulate proofs like how you manipulate program data. Proof manipulation allows ATS to be used as a theorem prover.Finally, you can draw the dichotomy of dynamics and statics as follows, in the form of "some kind of dynamic terms are characterized by a particular static sort."
In ATS, files can have several suffixes.
One notable difference between .dats and .sats is the declaration of function and value signatures.
About Linear Logic
Linear logic distincts itself from classical logic in the sense that weakening and contraction structural rules are disallowed. Weakening says you cannot be given a fact and leave it unused, and contraction says you cannot designate a fact for multiple uses. Therefore, a fact must be used exactly once.
Views are linear in that sense, and view proofs are typically used to track resource ownership or to update states of a resource.
.dats
extern val c: int
extern fun foo (x:int): int
fun bar (x: int): int = ...
.sats
val c: int
fun foo (x:int): int
(* not possible *)
However, more typically, .sats would define the type of a function to be implemented in .dats. In such case, the implementation in .dats does not need type annotation.
.dats
implement baz (x) = ...
.sats
fun baz (x:int): int
It is important to understand that, unlike ML, ATS type checker only performs very limited type inference. You have to explicitly annotate enough types (mostly arguments to a function) so ATS can reconstruct typing derivation.
ATS is a bit like ML. You can define type and datatype.
ATS
typedef t = int
typedef pair (a:type, b:type) = '(a, b)
datatype option (a:type) =
| Some(a) of a
| None(a)
SML
type t = int
type ('a, 'b) pair = 'a * 'b
datatype 'a option =
Some of 'a
| None
From the SML code, there is a predefined type int, and here we define a new type name t aliased to int. We can define aliases to types parameterized by some type variables. We also define an option datatype that is parameterized by a type variable a. These identifiers are all static identifiers, in contrast to dynamic identifiers which are run-time variables.
In ATS, notice that type variables for datatype constructors need to be explicitly applied. For more information about datatype, see Datatypes in the ATS tutorial, but you may want to read the rest of this section first.
Type definition assigns a static identifier to a static expression with the sort "type", so typedef is a special case for stadef. The following two static declarations are almost the same.
ATS (1)
(* sta t:type *)
typedef t = int
typedef pair (a:type, b:type) =
'(a, b)
ATS (2)
(* sta t:_ *)
stadef t = int
stadef pair (a:type, b:type) =
'(a, b)
The only difference is that typedef ensures that t will end up having the "type" sort, and stadef doesn't particularly care.
A type is one of the many sorts that a static expression or static identifier can be. Other sorts are:
You can define a new sort using datasort. The first few lines of sllist.dats demonstrate how to define a sort that resembles a list. You can also alias sorts using sortdef, which also lets you piggy-back constraints to a sort called subset sort. See prelude/sortdef.sats for examples. A constraint is simply a static expression of a boolean sort.
For each of the type, view, prop, it is possible to alias, declare abstract quantities, and form algebraic constructors.
Type, view, prop, and viewtype are special cases of statics, so whenever you see typedef, viewdef, propdef and viewtypedef, you can just use stadef. Similarly, for abstract declarations, you can use sta for abstype, absview, absprop, and absviewtype. However, datasort, datatype, dataview, dataprop and dataviewtype are not interchangeable. They define algebraic constructors for a new "thing" (e.g. datatype gives you constructors for a new type, dataview gives you constructors for a new view, and so on).
View, prop, and viewtype will not be discussed now. Here we focus on types and simple static expressions.Static expression is a first-order language that is sort checked. Here are some examples.
ATS
stadef nat_lt (i: int, limit: int) =
0 <= i && i < limit
The static function nat_lt can be used whenever a bool sort is required. For example, if we want to constraint the range of a run-time integer argument:
ATS
typedef NatLt (n: int) =
[i: int | nat_lt (i, n)] int(i)
We can then use the type NatLt(n) to write a function:
ATS
fun digit_of_num (i: NatLt(10)): char = ...
You can look at prelude/basics_sta.sats, which is part of ATS/Anariats, for the signatures of many built-in static operators. Notice that stadef allows many such operators to be overloaded. The fixity of many infix binary operators are defined in prelude/fixity.sats.
The first distinction is type with a fixed size, denoted simply as type, and type with an arbitrary but unknown size, denoted as t@ype. Values of type are like ML values, which has the same size as a machine pointer word. It can either be boxed or unboxed. Boxed values of type also resides on the managed (garbage collected) heap. Here are some examples of boxed values:
However, the ML types are too limited for system programming, which has to represent memory layout as a flat structure. This is where variables of t@ype would be useful. More often, memory layout of this form carries a view (linear resource) and a t@ype, and they are called viewt@ype altogether. Notice that t@ype is a subsort of viewt@ype, so anywhere viewt@ype is mentioned, we can give it just a t@ype. In fact, there is a static function sizeof(t) that has the same meaning in the C language, which has the static signature:
ATS
sta sizeof: viewt@ype -> int
And the sizeof static function can be used for any t@ype and viewt@ype.
The boxed tuple, record, and list have analogy in the flat types as well.
But constructing flat values is a bit different. A flat value can reside either on the heap (like a boxed value) or on stack (which ML doesn't use for allocating data structure). This is discussed in another article, Manipulating flat memory.
Flat memory may be uninitialized. The notation "t?" represents the type of an uninitialized memory that has the same size as t. For example, uninitialized int is written as "int?" (see margin note). However, given two types t1 and t2, even though we may know that sizeof(t1) == sizeof(t2), the compiler still treats t1? and t2? as two different types.Finally, we occasionally see annotated '+' and '-' at the end of type (as well as viewtype) and t@ype (as well as viewt@ype). The '+' means that its parameters are covariant; '-' means the parameters are contravariant. Without the annotation, parameters are nonvariant. This is useful for subsorting for types indexed by a static expression. The covariant use is more common. Contravariance is rarely used.
The anatomy of an ATS function type deserves careful examination. The simplest form, similar to ML, is simply:
Int type is flat
It may come as a surprise to the reader, but int is a t@ype in ATS. Consider the C type int and long on a 64-bit machine: int is 32-bits, long is 64-bits, and pointers are 64-bits. Int does not have the same size as pointers. In the same way, long is also a t@ype in ATS because on a 16-bit machine, int is 16-bits, long is 32-bits, and pointers are 16-bits.
ATS (.sats)
fun function_name (x1: t1, ..., xn: tn): t' =
"c_name"
SML (signature)
val function_name: (t1 * ... * tn) -> t'
Notice we can tell ATS that function_name will be called c_name when translated to C, though this is optional. This can be used to either:
This has several important consequences.
Adding guarded and assertion dependent types to the mix, we now have:
ATS (.sats)
fun function_name
{forall quantifiers | guards} (x1: t1, ..., xn: tn):
[exists quantifiers | asserts] t' =
"c_name"
Where guards are static expressions of the bool sort written in terms of any forall quantifiers, and asserts are static expressions of the bool sort written in terms of any forall and exists quantifiers. Furthermore, t1 ... tn are static expressions of the type sort that can be written in terms of the for-all quantifiers, and t' is a static expression of the type sort that can be written in terms of any forall and exists quantifiers.
Now, we add proof terms to the mix too.
ATS (.sats)
fun function_name
{forall quantifiers | guards} (pf1: prop1, ..., pfp: propp | x1: t1, ..., xn: tn):
[exists quantifiers | asserts] (prop'1, ..., prop'q | t') =
"c_name"
Notice that the uncurried arguments are now separated into two parts, proof variables and program variables. Proof variables will be erased once compilation finishes, leaving just the program variables in the resulting compiled program.
Finally, ATS uses decorated arrow type to keep track of side-effects as tags.
ATS (.sats)
fun function_name
{forall quantifiers | guards} (pf1: prop1, ..., pfp: propp | x1: t1, ..., xn: tn)
:<tag1, ..., tagk>
[exists quantifiers | asserts] (prop'1, ..., prop'q | t') =
"c_name"
If we were to write function type as a static expression (as opposed to function signature declaration like we've seen so far), the type expression would look like this:
ATS (static expression)
{forall...} (prop1, ..., propp | t1, ..., tn)
-<tag1, ..., tagk> [exists...] (prop'1, ..., prop'q | t')
The arrow type in ATS looks like "-<>", and the form -<tag1, ..., tagk> is an augmented arrow type with decoration. ATS currently recognizes the following tags for distinguishing various kind of functions:
As well as the following tags to track side-effects:
In addition to the tags that distinguish function kinds, we can attach the 0 and 1 suffixes as a shorthand to indicate the presence or absence of effects. For example, cloref1 is a common tag for describing ML-style closure that is garbage collected and can have any side-effect (e.g. raises exceptions).
What does a typical ATS function declaration look like? Here we show an example from the Standard C library that sports the following prototype.
char *fgets(char *s, int size, FILE *stream);
From the man page, "fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. A '\0' is stored after the last character in the buffer." And that "fgets() returns s on success, and NULL on error or when end of file occurs while no characters have been read."
We now look at the ATS declaration, heavily commented and colored (orange is comment, cyan is keyword, and blue is static expression).
(* fgets_v describes the result of fgets, which can indicate either a
* success or failure.
*)
dataview fgets_v (sz:int, addr, addr) =
| (* If success, location l is now a string buffer of size sz storing
* a string of length n.
*)
{n:nat | n < sz} {l:addr | l <> null}
fgets_v_succ(sz, l, l) of strbuf(sz, n) @ l
| (* If failure, location l is still just sz number of bytes. *)
{l:addr}
fgets_v_fail(sz, l, null) of bytes(sz) @ l
fun fgets
(* count is the number of bytes we want to read, and it has to fit
* within sz, the size of buffer.
*)
{count, sz:int | 0 < count; count <= sz}
(* fm is the static sort for file mode. *)
{m:fm}
(* l is for dependent type indexing for address of buffer. *)
{l:addr}
( (* Proof that file mode implies read access. *)
pf_mod: file_mode_lte(m, r),
(* Linear proof that sz number of bytes is at l. *)
pf_buf: bytes(sz) @ l
| (* Argument, pointer indexed by location l. *)
s: ptr(l),
(* Argument, read at most count bytes. *)
size: int(count),
(* Argument, call by reference to FILE structure opened with
* mode m.
*)
stream: &FILE(m)
)
:<>
(* l' is the index for the return pointer. *)
[l':addr]
( (* pf_buf is consumed, and this determines what it turns into
* according to l'.
*)
fgets_v(sz, l, l')
| (* Return pointer indexed by l'. *)
ptr(l')
) =
"fgets"
This example is based on libc/SATS/stdio.sats, part of ATS/Anariats, with slight modification and heavily annotated with comments. It may not be immediately apparent to the first-time reader, but there are some subtle points:
For your reference, this is what the above code looks like without comments.
dataview fgets_v (sz:int, addr, addr) =
| {n:nat | n < sz} {l:addr | l <> null}
fgets_v_succ(sz, l, l) of strbuf(sz, n) @ l
| {l:addr}
fgets_v_fail(sz, l, null) of bytes(sz) @ l
fun fgets
{n, sz:int | 0 < n; n <= sz} {m:fm} {l:addr} (
pf_mod: file_mode_lte(m, r), pf_buf: bytes(sz) @ l
| s: ptr(l), size: int(n), stream: &FILE(m))
:<>
[l':addr] (fgets_v(sz, l, l') | ptr(l')) =
"fgets"
TODO