OCaml: Bespoke Derivers and Mutually Recursive Types

Written: 2025-01-03; Updated: 2025-01-04

The setting

PPX-based derivers are a standard meta-programming tool in OCaml. One example of a useful deriver is ppx_yojson_conv, which can be used to convert to and from JSON (using the Yojson library). A simple example:

type tree =
| Leaf of int
| Node of tree * tree
[@@deriving yojson]
(* automatically produces:
val yojson_of_tree : tree -> Yojson.Safe.t
val tree_of_yojson : Yojson.Safe.t -> tree *)

A common pattern is to wrap the recursion with an intermediate mutually recursive type that is used to store extra information at each node. For instance, we may want to store the height of the trees in the above example.

type tree = { tree: tree_; height: int }
and tree_ =
| Leaf of int
| Node of tree * tree
[@@deriving yojson]
(* produces:
val yojson_of_tree : tree -> Yojson.Safe.t
val yojson_of_tree_ : tree_ -> Yojson.Safe.t
val tree_of_yojson : Yojson.Safe.t -> tree
val tree__of_yojson : Yojson.Safe.t -> tree_ *)

The problem

Sometimes we may want to use a wrapper that is not under our control, such as one that comes from an external library. We can simulate this situation by assuming a wrapper type, 'a wrapped, without any existing Yojson derivers. Here is what we may start with:

(* external import:
type 'a wrapped = { data: 'a; meta: int } *)
type tree = tree_ wrapped
and tree_ =
| Leaf of int
| Node of tree * tree
[@@deriving yojson]

This will fail to compile because ppx_yojson_conv does not know how to convert to and from values of type tree_ wrapped. Fortunately, ppx_yojson_conv allows us to manually supply conversion functions for any type constructor. For instance, we could write the following conversion functions for the wrapped type constructor before the definition of the tree and tree_ types:

(* yojson_of_wrapped : ('a -> Yojson.Safe.t) -> 'a wrapped -> Yojson.Safe.t *)
let yojson_of_wrapped yofd wr = yofd wr.data
(* wrapped_of_yojson : (Yojson.Safe.t -> 'a) -> Yojson.Safe.t -> 'a wrapped *)
let wrapped_of_yojson dofy js = { data = dofy js; meta = -1 }

Because ppx_yojson_conv works using the names of the type constructors, it will happily derive the converters for tree and tree_ using the converters we wrote for wrapped. If we can live with the uniform wrapped_of_yojson function, in particular, which fills in a dummy value of -1 for meta, there is no remaining problem.

However, there may be instances where such a uniform treatment is not suitable. Suppose we want to interpret the meta field as a height. That is, we would like for trees to have the following invariant:

The meta field of a tree is always 1 + the maximum of the meta fields of its immediate subtrees.

We could simply re-traverse the tree produced by tree_of_yojson and fix the heights to satisfy the invariant, but that comes at a price. Computationally, we would end up deep-copying the entire tree, wasting both space and time. We would also need to write a new definition of tree_of_yojson that shadows the derived function, to ensure that clients of this code always get tree values that satisfy the invariant. If the original version of tree_of_yojson gets exposed by accident, it will lead to difficult to diagnose bugs. Finally, modern IDE tools such as Merlin or ocaml-lsp can get confused about the provenance of the tree_of_yojson function. The jump to definition feature, for example, would take the user to the function that fixes the heights, not the original (derived) function that actually converts JSON to trees.

Is there a better way? Instead of fixing a badly built tree, could we not just selectively override the tree_of_yojson function? After all, we don’t really care how other instances of 'a wrapped are JSON-ified!

But, this doesn’t seem possible (as stated) in OCaml. The tree and tree_ types are mutually recursive, so transformations on them must be written as mutually recursive functions at the same time. In fact, since these two types are defined at the same time, it is not possible to use a PPX deriver on one but not the other, since the [@@derived] annotation applies to the entire block of mutually recursive types.

Folklore: Recursive modules are enough

There is an old idea in ML module folklore: if you have module-level recursion, then you can implement all the other kinds of recursion in the language with it, including recursive types and functions. In other words, the only occurrence of rec will be with module rec. Here is how one might write the above recursive type of trees and a recursive height function on trees using recursive modules.

module rec Tree : sig
type tree =
| Leaf of int
| Node of tree * tree
val height : tree -> int
end = struct
type nonrec tree =
| Leaf of int
| Node of Tree.tree * Tree.tree
let height tr =
match tr with
| Leaf _ -> 1
| Node (t1, t2) ->
1 + max (Tree.height t1) (Tree.height t2)
end

Observe that the body of the module Tree uses syntactically non-recursive definitions, enforced using nonrec on the type tree and a missing rec on the function height. Nevertheless these are valid recursive definitions as seen below:

Terminal window
# Tree.(height (Node (Node (Leaf 1, Leaf 2), Leaf 3)));;
- : int = 3

Solution to the problem: Use recursive modules to override derivers

This solves our conundrum. Since the type definitions are non-recursive, they can be done in any order and separated by arbitrary code. Indeed, we can expose a mutually recursive tree and tree_ type, but write their converters one at a time, independently, deriving some using ppx_yojson_conv and manually writing others. Here is an example where we use two convenience functions, leaf and node, to build trees.

module rec Tree : sig
type tree = { tree: tree_; height: int }
and tree_ =
| Leaf of int
| Node of tree * tree
[@@deriving yojson]
val leaf : int -> tree
val node : tree -> tree -> tree
end = struct
type nonrec tree_ =
| Leaf of int
| Node of Tree.tree * Tree.tree
[@@deriving yojson]
type nonrec tree = { tree: Tree.tree_; height: int }
let leaf x = { tree = Leaf x; height = 1 }
let node tr1 tr2 = { tree = Node (tr1, tr2);
height = 1 + max tr1.height tr2.height }
let yojson_of_tree tr = yojson_of_tree_ tr.tree
let tree_of_yojson js =
match tree__of_yojson js with
| Leaf k -> leaf k
| Node (tr1, tr2) -> node tr1 tr2
end

We then have:

Terminal window
# let js = Tree.(yojson_of_tree (node (node (leaf 1) (leaf 2)) (leaf 3)));;
val js : Yojson.Safe.t =
`List
[`String "Node";
`List
[`String "Node"; `List [`String "Leaf"; `Int 1];
`List [`String "Leaf"; `Int 2]];
`List [`String "Leaf"; `Int 3]]
# Tree.tree_of_yojson js;;
- : Tree.tree =
{Tree.tree =
Tree.Node
({Tree.tree =
Tree.Node
({Tree.tree = Tree.Leaf 1; height = 1},
{Tree.tree = Tree.Leaf 2; height = 1});
height = 2},
{Tree.tree = Tree.Leaf 3; height = 1});
height = 3}

Indeed, the yojson_of_tree function gets rid of the heights and they are recomputed by the tree_of_yojson function.

Application: Hashconsed and Yojsonable data

The Hashcons module by J. C. Filliatre et al can be used to de-duplicate data structures in memory. It is a prime example of a wrapper type that we don’t control, consisting essentially of the following private type:

type 'a hash_consed = private {
hkey: int; (* a hash code *)
tag: int; (* a globally unique tag *)
node: 'a;
}

Because this type is private, it is not possible for a deriver from JSON to build values of this wrapped type directly. Instead, we have to use the Hashcons.Make() functor to instantiate a factory for _ hash_consed objects by supplying a coherent pair of functions called equal and hash that uses the .hkey fields and pointer equality.

Here is what it would look like for trees without worrying about the PPX derivers temporarily:

Now imagine we want to add [@@deriving yojson] to these trees. Without the module rec trick, it would be utterly impossible, since the only way to build values of type tree_ hash_consed is via the Mk module. With the module rec trick, it is a breeze:

module rec Tree : sig
type tree = tree_ Hashcons.hash_consed
and tree_ =
| Leaf of int
| Node of tree * tree
[@@deriving yojson]
val leaf : int -> tree
val node : tree -> tree -> tree
end = struct
type tree_ =
| Leaf of int
| Node of Tree.tree * Tree.tree
[@@deriving yojson]
module Mk = Hashcons.Make(struct
type nonrec t = tree_
let equal t1 t2 =
match t1, t2 with
| Leaf j, Leaf k -> j = k
| Node (t1l, t1r), Node (t2l, t2r) ->
t1l == t2l && t1r == t2r
| _ -> false
let hash t =
match t with
| Leaf k -> k
| Node (t1, t2) -> abs (((t1.hkey * 19) + t2.hkey) * 19 + 2)
end)
let table = Mk.create 19
let leaf n = Mk.hashcons table (Leaf n)
let node t1 t2 = Mk.hashcons table (Node (t1, t2))
type tree = tree_ Hashcons.hash_consed
let yojson_of_tree (tr : tree) = yojson_of_tree_ tr.node
let tree_of_yojson js = Mk.hashcons table (tree__of_yojson js)
end
  1. Nothing changes for signatures.
  2. The deriver works as usual for datatypes in our control.
  3. For types out of our control, we manually supply the conversions.