OCaml: Bespoke Derivers and Mutually Recursive Types
Written: 2025-01-03; Updated: 2025-01-04
The setting
PPX-based derivers are a standard meta-programming tool in OCaml. One example of
a useful deriver is ppx_yojson_conv
, which can be used to
convert to and from JSON (using the Yojson library). A simple example:
A common pattern is to wrap the recursion with an intermediate mutually recursive type that is used to store extra information at each node. For instance, we may want to store the height of the trees in the above example.
The problem
Sometimes we may want to use a wrapper that is not under our control, such as
one that comes from an external library. We can simulate this situation by
assuming a wrapper type, 'a wrapped
, without any existing Yojson derivers.
Here is what we may start with:
This will fail to compile because ppx_yojson_conv
does not know how to convert
to and from values of type tree_ wrapped
. Fortunately, ppx_yojson_conv
allows us to manually supply conversion functions for any type constructor.
For instance, we could write the following conversion functions for the
wrapped
type constructor before the definition of the tree
and tree_
types:
Because ppx_yojson_conv
works using the names of the type constructors, it
will happily derive the converters for tree
and tree_
using the converters
we wrote for wrapped
. If we can live with the uniform wrapped_of_yojson
function, in particular, which fills in a dummy value of -1
for meta
, there
is no remaining problem.
However, there may be instances where such a uniform treatment is not suitable.
Suppose we want to interpret the meta
field as a height. That is, we would
like for tree
s to have the following invariant:
The
meta
field of atree
is always 1 + the maximum of themeta
fields of its immediate subtrees.
We could simply re-traverse the tree
produced by tree_of_yojson
and fix the
heights to satisfy the invariant, but that comes at a price. Computationally, we
would end up deep-copying the entire tree
, wasting both space and time. We
would also need to write a new definition of tree_of_yojson
that shadows the
derived function, to ensure that clients of this code always get tree
values
that satisfy the invariant. If the original version of tree_of_yojson
gets
exposed by accident, it will lead to difficult to diagnose bugs. Finally, modern
IDE tools such as Merlin or ocaml-lsp can get confused about the provenance of
the tree_of_yojson
function. The jump to definition feature, for example,
would take the user to the function that fixes the heights, not the original
(derived) function that actually converts JSON to trees.
Is there a better way? Instead of fixing a badly built tree
, could we not just
selectively override the tree_of_yojson
function? After all, we don’t really
care how other instances of 'a wrapped
are JSON-ified!
But, this doesn’t seem possible (as stated) in OCaml. The tree
and tree_
types are mutually recursive, so transformations on them must be written as
mutually recursive functions at the same time. In fact, since these two types
are defined at the same time, it is not possible to use a PPX deriver on one but
not the other, since the [@@derived]
annotation applies to the entire block of
mutually recursive types.
Folklore: Recursive modules are enough
There is an old idea in ML module folklore: if you have module-level recursion,
then you can implement all the other kinds of recursion in the language with it,
including recursive types and functions. In other words, the only occurrence of
rec
will be with module rec
. Here is how one might write the above recursive
type of trees and a recursive height function on trees using recursive modules.
Observe that the body of the module Tree
uses syntactically non-recursive
definitions, enforced using nonrec
on the type tree
and a missing rec
on
the function height
. Nevertheless these are valid recursive definitions as
seen below:
Solution to the problem: Use recursive modules to override derivers
This solves our conundrum. Since the type
definitions are non-recursive, they
can be done in any order and separated by arbitrary code. Indeed, we can expose
a mutually recursive tree
and tree_
type, but write their converters one at
a time, independently, deriving some using ppx_yojson_conv
and manually
writing others. Here is an example where we use two convenience functions,
leaf
and node
, to build tree
s.
We then have:
Indeed, the yojson_of_tree
function gets rid of the heights and they are
recomputed by the tree_of_yojson
function.
Application: Hashconsed and Yojsonable data
The Hashcons module by J. C. Filliatre et al can be used to de-duplicate data structures in memory. It is a prime example of a wrapper type that we don’t control, consisting essentially of the following private type:
Because this type is private, it is not possible for a deriver from JSON to
build values of this wrapped type directly. Instead, we have to use
the Hashcons.Make()
functor to instantiate a factory for _ hash_consed
objects by supplying a coherent pair of functions called equal
and
hash
that uses the .hkey
fields and pointer equality.
Here is what it would look like for trees without worrying about the PPX derivers temporarily:
Now imagine we want to add [@@deriving yojson]
to these trees. Without
the module rec
trick, it would be utterly impossible, since the only
way to build values of type tree_ hash_consed
is via the Mk
module.
With the module rec
trick, it is a breeze:
- Nothing changes for signatures.
- The deriver works as usual for datatypes in our control.
- For types out of our control, we manually supply the conversions.