Preprocessor extensions for code generation
"A Guide to Extension Points in OCaml"[1] provides a great "quick-start" on using the OCaml extension points API to implement preprocessor extensions for abstract syntax tree rewrites. This post picks up where that tutorial leaves off by showing how to write a ppx that does code generation.
The problem treated here is one posed in Whitequark's blog :
"Implement a syntax extension that would accept type declarations of
the form
type t = A [@id 1] | B of int [@id 4] [@@id_of]
to generate a function mapping a value of type t
to its
integer representation."
Implementing the "id_of
" ppx
The basic strategy
In the OCaml parse tree, structures are lists of structure items. Type declarations are structure items as are let-bindings to functions.
In this program, analysis of an inductive type declaration t
may result in the production of a new structure item, the AST of an of_id
function
to be appended to the structure containing t
.
Now the general strategy in writing a ppx is to provide a record
of type Ast_mapper.mapper
. That record is based on
the Ast_mapper.default_mapper
record but selectively
overriding those fields for those sytactic categories that the ppx
is intending to transform.
Now, as we determined above, the effect of the ppx is to provide
a function from a structure to a new structure. Accordingly, at a
minimum then we'll want to override the structure
field
of the default mapper. Schematically then our ppx code will take on
the following shape.
open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident
let structure_mapper mapper structure =
...
let id_of_mapper = {
default_mapper with structure = structure_mapper
}
let () = register "id_of" id_of_mapper
This program goes just a little bit further
though. Any @id
or @@id_of
attributes that
get as far as the OCaml compiler would be ignored. So, it's not
neccessary that they be removed by our ppx once they've been acted
upon but it seems tidy to do so. Accordingly, there are two more
syntactic constructs that this ppx operates on.
open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident
let structure_mapper mapper structure =
...
let type_declaration_mapper mapper decl =
...
let constructor_declaration_mapper mapper decl =
...
let id_of_mapper argv = {
default_mapper with
structure = structure_mapper;
type_declaration = type_declaration_mapper;
constructor_declaration = constructor_declaration_mapper
}
Implementing the mappings
To warm up, lets start with the easy mappers.
The role of type_declaration_mapper
is a function
from a type_declaration
argument to
a type_declaration
result that is the argument in all
but that any @@id_of
attribute has been removed.
let type_declaration_mapper
(mapper : mapper)
(decl : type_declaration) : type_declaration =
match decl with
(*Case of an inductive type "t"*)
| {ptype_name = {txt = "t"; _};
ptype_kind = Ptype_variant constructor_declarations;
ptype_attributes;_} ->
let (_, attrs) =
List.partition (fun ({txt;_},_) ->txt="id_of") ptype_attributes in
{(default_mapper.type_declaration mapper decl)
with ptype_attributes=attrs}
(*Not an inductive type named "t"*)
| _ -> default_mapper.type_declaration mapper decl
constructor_declaration_mapper
is analogous
to type_declaration_mapper
above but this time
its @id
attributes that are removed.
let constructor_declaration_mapper
(mapper : mapper)
(decl : constructor_declaration) : constructor_declaration =
match decl with
| {pcd_name={loc; _}; pcd_attributes; _} ->
let (_, attrs) =
List.partition (fun ({txt;_}, _) -> txt="id") pcd_attributes in
{(default_mapper.constructor_declaration mapper decl)
with pcd_attributes=attrs}
Now to the raison d'etre of the
ppx, structure_mapper
.
First, a utility function that computes from
a constructor_declaration
with an @id
attribute, a (function) case
for it. For example,
suppose "Bar of int [@id 4]
" is the constructor
declaration, then the case
to be computed is the AST
corresponding to the code "| Bar _ -> 4
".
let case_of_constructor_declaration :
constructor_declaration -> case = function
| {pcd_name={txt;loc};pcd_args;pcd_attributes; _} ->
match List.filter (fun ({txt;_}, _) -> txt="id") pcd_attributes with
(*No "@id"*)
| [] ->
raise (Location.Error (Location.error ~loc "[@id] : Missing"))
(*Single "@id"*)
| [(_, payload)] ->
begin match payload with
| PStr [{pstr_desc=Pstr_eval ({pexp_desc=
Pexp_constant (Pconst_integer (id, None)); _}, _)
}] ->
Exp.case
(Pat.construct
{txt=Lident txt; loc=(!default_loc)}
(match pcd_args with
| Pcstr_tuple [] -> None | _ -> Some (Pat.any ())))
(Exp.constant (Pconst_integer (id, None)))
| _ ->
raise (Location.Error (Location.error ~loc
"[@id] : Bad (or missing) argument (should be int e.g. [@id 4])"))
end
(*Many "@id"s*)
| (_ :: _) ->
raise (Location.Error (Location.error ~loc
"[@id] : Multiple occurences"))
One more utility function is required.
eval_structure_item item acc
computes structure
items to push on the front of acc
. If item
is a single declaration of an inductive type t
attributed with @@id_of
, then two structure items will
be produced : one for t
and one synthesized
for t
's of_id
function. In all other
cases, just one structure item will be pushed onto acc
.
let eval_structure_item
(mapper : mapper)
(item : structure_item)
(acc : structure) : structure =
match item with
(*Case of a single inductive type declaration*)
| { pstr_desc = Pstr_type (_, [type_decl]); pstr_loc} ->
begin
match type_decl with
(*Case where the type identifer is [t]*)
| {ptype_name = {txt = "t"; _};
ptype_kind = Ptype_variant constructor_declarations;
ptype_attributes;
_} ->
begin
match List.filter (fun ({txt;_},_) ->txt="id_of")
ptype_attributes
with
(*No [@@id_of]*)
| [] -> default_mapper.structure_item mapper item :: acc
(*At least one [@@id_of] (treat multiple occurences as if
one)*)
| _ ->
(*Cases of an [id_of] function for [t], one for each
of its constructors*)
let cases=
List.fold_right
(fun x acc ->
case_of_constructor_declaration x :: acc)
constructor_declarations [] in
(*The [id_of] function itself*)
let id_of : structure_item =
Str.value Nonrecursive [
Vb.mk
(Pat.var {txt="id_of"; loc=(!default_loc)})
(Exp.function_ cases)] in
default_mapper.structure_item mapper item :: id_of :: acc
end
(*Case the type identifier is something other than [t]*)
| _ -> default_mapper.structure_item mapper item :: acc
end
(*Case this structure item is something other than a single type
declaration*)
| _ -> default_mapper.structure_item mapper item :: acc
Finally we can write structure_mapper
itself as a
simple fold over a structure.
let structure_mapper
(mapper : mapper)
(structure : structure) : structure =
List.fold_right (eval_structure_item mapper)structure []
Building and testing
So that's it, this preprocessor extension is complete. Assuming
the code is contained in a file called ppx_id_of.ml
it
can be compiled with a command along the lines of the following.
ocamlc -o ppx_id_of.exe -I +compiler-libs ocamlcommon.cma ppx_id_of.mlWhen built, it can be tested with a command like
ocamlc -dsource -ppx ppx_id_of.exe test.ml
.
For example, when invoked on the following program,
type t =
| A [@id 2]
| B of int [@id 4] [@@id_of]
module M = struct
type t =
| Foo of int [@id 42]
| Bar [@id 43] [@@id_of]
module N = struct
type t =
| Baz [@id 8]
| Quux of string * int [@id 7] [@@id_of]
module Q = struct
type t =
| U [@id 0] [@@id_of]
end
end
end
the resulting output is,
type t =
| A
| B of int
let id_of = function | A -> 2 | B _ -> 4
module M =
struct
type t =
| Foo of int
| Bar
let id_of = function | Foo _ -> 42 | Bar -> 43
module N =
struct
type t =
| Baz
| Quux of string * int
let id_of = function | Baz -> 8 | Quux _ -> 7
module Q = struct type t =
| U
let id_of = function | U -> 0 end
end
end
References:
[1] "A
Guide to Extension Points in OCaml" -- Whitequark (blog post
2014)