Fantasy World OCaml

Written in 2013

Real World OCaml (RWO) will finally explain how OCaml works in the real world. This document, which has no relation to RWO, explains how OCaml works in my fantasies. I will try to highlight only how fantasy differs from reality.

Please note that none of the following describes the real OCaml as of version 4.00.1.

Language

Lexical structure

  1. All source files are utf-8 encoded.

  2. The character literal forms \x and \d specify Unicode code-points.

  3. The following tokens are not keywords: asr, done, lor, land, lxor, lsl, lsr, mod, or.

  4. The token :: is an ordinary operator (also see “builtins” below).

Module language

  1. New infix operators are allowed to be declared in modules and module types. The Pervasives module includes infix declarations for the above tokens that are no longer keywords. Only the fixity declarations exposed in the signature are exported to other modules. All such infix operators have the same canonical precedence between ; and let ... in.

  2. Modules and module types are unified into mixins, like in MixML. This allows, among other things, module definitions to be distributed across multiple files, and for multiple modules to share common sub-modules. The versatility of the ML module system is drastically enhanced.

  3. A hierarchical module namespace is maintained in compiled code, but no compilation unit is allowed to specify an absolute module path. Instead, libraries are allowed to be grafted into any subtree of the module path of client code. This allows client code to use a module path “claimed” by another package by simply moving the other package to a non-conflicting path. The system is nevertheless clever enough to maintain only a single copy of any piece of code.

  4. Stateless modules—i.e., modules that do not have any initialization code—are allowed to be pruned if they are not used (dead module elimination). These modules also have zero initialization overhead. A warning is added to complain about a module having initialization code.

Language builtins

  1. Strings are utf-8 encoded and the character type has the same size as int and stores Unicode code-points.

  2. New primitive types byte and bytestring fill the roles char and string play in the real OCaml.

  3. Values of type string and bytestring are not mutable.

  4. The forms [] and (::)(_, _) are not hard coded into the grammar.

Types and type definitions

  1. Recursion in type abbreviations is simply syntactically impossible. The declaration type t = t is not interpreted as a recursive type abbreviation (which then triggers a compile error) but as an abbreviation of t as an existing t. Such definitions are often needed in functor arguments that require a structure with a type t. Recursion is still implicit in variant (polymorphic or otherwise) definitions.

  2. All non-nullary value constructors (but not polymorphic variants) implicitly define a functional form. To illustrate, the definition type 'a lst = Nil | Cons of 'a * 'a lst also implicitly defines a function _Cons : 'a -> 'a lst -> 'a lst.

  3. There is absolutely no difference between type t = T of int * bool and type t = T of (int * bool). Parentheses in type definitions serve only a disambiguating role.

  4. The following is not silently accepted: let f : 'a -> 'a = fun x -> x + 1. In other words, this definition is identical to let f : 'a. 'a -> 'a = fun x -> x + 1.

  5. Views are supported.

Expressions and value definitions

  1. The following is no longer silently accepted: fun x x -> x. Likewise for let f x x = x. In either case, the compiler complains about a repeated argument variable x, like it has always done for fun (x, x) -> x.

  2. The form <expr> match? <pattern> is an expression of type bool such that e match? p returns true if and only if (match e with p -> true | _ -> false) returns true.

  3. The form <pattern> as <pattern> generalizes <pattern> as <identifier>. The two arguments to as must define a disjoint set of variables, and the union of bindings from both patterns is added to the scope of the pattern.

  4. The form let! f arg1 ... argn = ... is interpreted as a definition of f with n arguments that is forcefully inlined at compile time. Note that such definitions cannot be recursive, and these functions revert to ordinary functions when partially applied, especially when used as arguments to other functions. A corresponding val! declaration is added to the module type. Compiled modules store a reusable Lambda representation of such inlinable definitions.

  5. The do ... done form is removed, and for and while just use begin ... end (which are mandatory). The do keyword is reserved for future use in computational expressions, while done is released.

  6. Function arguments are evaluated from left to right.

Pervasives

  1. The list type is defined as: type 'a list = Nil | Cons of 'a * 'a list. The legacy forms [] and :: in patterns are retained as synonyms for Nil and Cons for backwards compatibility. The infix operator (::) : 'a -> 'a list -> 'a list replaces the hard coded forms <expr> :: <expr> and (::)(<expr>, <expr>) in expressions.

Build Tools and Runtime

  1. There is a canonical, well-maintained, and documented building tool for OCaml: ocamlbuild. All other build systems, including OCamlMakefile and omake, are considered obsolete.

  2. Camlp4 uses only the grammar of OCaml. There is no “revised grammar”, which has been ruthlessly expunged from history. There is also a comprehensive test-suite for OCaml parsing that both the compiler and Camlp4 are required to pass – gone are the days of Camlp4 and OCaml having different opinions on such things as source positions, syntax errors, etc.

  3. The build system does not assume the existence of bash and a standard Unix tool-chain, allowing it to be built on MinGW without the need for Cygwin. Cross-compiling a MinGW binary from Linux is also a standard (but optional) part of the OCaml distribution.

  4. The ocamlyacc tool is deprecated in favour of Menhir.

  5. The runtime encapsulates all its global state in a “runtime context” object. Multiple context objects may exist simultaneously in the same process. The at_exit function adds exit functions to be called when the runtime is destroyed, not when the process exits.

  6. There is an LLVM backend.

  7. The standard library and runtime are re-licensed as a MIT or 2 clause BSD style license. The legally questionable LGPL 2.1 + linking exception license is deprecated. The compiler is re-licensed as GPL, with the old QPL deprecated. The deprecated licenses are only applied to legacy versions.

Libraries

  1. LablTk, Graphics, Str and Num are dropped from the standard distribution. They survive as community-maintained libraries.

  2. The runtime is disentangled from the Bigarray and Unix libraries, so that the entire standard library may be replaced. Community-maintained standard library replacements can jack directly into the OCaml runtime if needed, instead of operating as a shell on top of the standard library.

Endnote

This document expresses the view of its author, Kaustuv Chaudhuri, alone. It should not be misinterpreted as the views of anyone else, particularly not of the OCaml developers.

Last compiled at: