Thursday, May 4, 2017

Preprocessor extensions for code generation

PPX

Preprocessor extensions for code generation

"A Guide to Extension Points in OCaml"[1] provides a great "quick-start" on using the OCaml extension points API to implement preprocessor extensions for abstract syntax tree rewrites. This post picks up where that tutorial leaves off by showing how to write a ppx that does code generation.

The problem treated here is one posed in Whitequark's blog : "Implement a syntax extension that would accept type declarations of the form type t = A [@id 1] | B of int [@id 4] [@@id_of] to generate a function mapping a value of type t to its integer representation."

Implementing the "id_of" ppx

The basic strategy

In the OCaml parse tree, structures are lists of structure items. Type declarations are structure items as are let-bindings to functions.

In this program, analysis of an inductive type declaration t may result in the production of a new structure item, the AST of an of_id function to be appended to the structure containing t.

Now the general strategy in writing a ppx is to provide a record of type Ast_mapper.mapper. That record is based on the Ast_mapper.default_mapper record but selectively overriding those fields for those sytactic categories that the ppx is intending to transform.

Now, as we determined above, the effect of the ppx is to provide a function from a structure to a new structure. Accordingly, at a minimum then we'll want to override the structure field of the default mapper. Schematically then our ppx code will take on the following shape.

open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident

let structure_mapper mapper structure =
  ...

 let id_of_mapper =  {
   default_mapper with structure = structure_mapper
}

let () = register "id_of" id_of_mapper

This program goes just a little bit further though. Any @id or @@id_of attributes that get as far as the OCaml compiler would be ignored. So, it's not neccessary that they be removed by our ppx once they've been acted upon but it seems tidy to do so. Accordingly, there are two more syntactic constructs that this ppx operates on.

open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident

let structure_mapper mapper structure =
  ...

let type_declaration_mapper mapper decl =
  ...

let constructor_declaration_mapper mapper decl =
  ...

let id_of_mapper argv = {
  default_mapper with
    structure = structure_mapper;
    type_declaration = type_declaration_mapper;
    constructor_declaration = constructor_declaration_mapper
}

Implementing the mappings

To warm up, lets start with the easy mappers.

The role of type_declaration_mapper is a function from a type_declaration argument to a type_declaration result that is the argument in all but that any @@id_of attribute has been removed.

let type_declaration_mapper
    (mapper : mapper)
    (decl : type_declaration) : type_declaration  =
  match decl with
    (*Case of an inductive type "t"*)
  | {ptype_name = {txt = "t"; _};
     ptype_kind = Ptype_variant constructor_declarations;
     ptype_attributes;_} ->
    let (_, attrs) =
      List.partition (fun ({txt;_},_) ->txt="id_of") ptype_attributes in
    {(default_mapper.type_declaration mapper decl)
    with ptype_attributes=attrs}
  (*Not an inductive type named "t"*)
  | _ -> default_mapper.type_declaration mapper decl

constructor_declaration_mapper is analogous to type_declaration_mapper above but this time its @id attributes that are removed.

let constructor_declaration_mapper
    (mapper : mapper)
    (decl : constructor_declaration) : constructor_declaration =
  match decl with
  | {pcd_name={loc; _}; pcd_attributes; _} ->
    let (_, attrs) =
      List.partition (fun ({txt;_}, _) -> txt="id") pcd_attributes  in
    {(default_mapper.constructor_declaration mapper decl)
    with pcd_attributes=attrs}

Now to the raison d'etre of the ppx, structure_mapper.

First, a utility function that computes from a constructor_declaration with an @id attribute, a (function) case for it. For example, suppose "Bar of int [@id 4]" is the constructor declaration, then the case to be computed is the AST corresponding to the code "| Bar _ -> 4".

  let case_of_constructor_declaration :
      constructor_declaration -> case =  function
    | {pcd_name={txt;loc};pcd_args;pcd_attributes; _} ->
      match List.filter (fun ({txt;_}, _) -> txt="id") pcd_attributes with
      (*No "@id"*)
      | [] ->
        raise (Location.Error (Location.error ~loc "[@id] : Missing"))
      (*Single "@id"*)
      | [(_, payload)] ->
        begin match payload with
          | PStr [{pstr_desc=Pstr_eval ({pexp_desc=
              Pexp_constant (Pconst_integer (id, None)); _}, _)
            }] ->
            Exp.case
              (Pat.construct
                 {txt=Lident txt; loc=(!default_loc)}
                 (match pcd_args with
                 | Pcstr_tuple [] -> None | _ -> Some (Pat.any ())))
              (Exp.constant (Pconst_integer (id, None)))
          | _ ->
            raise (Location.Error (Location.error ~loc
            "[@id] : Bad (or missing) argument (should be int e.g. [@id 4])"))
        end
      (*Many "@id"s*)
      | (_ :: _) ->
        raise (Location.Error (Location.error ~loc
        "[@id] : Multiple occurences"))

One more utility function is required.

eval_structure_item item acc computes structure items to push on the front of acc. If item is a single declaration of an inductive type t attributed with @@id_of, then two structure items will be produced : one for t and one synthesized for t's of_id function. In all other cases, just one structure item will be pushed onto acc.

let eval_structure_item
    (mapper : mapper)
    (item : structure_item)
    (acc : structure) : structure =
  match item with
  (*Case of a single inductive type declaration*)
  | { pstr_desc = Pstr_type (_, [type_decl]); pstr_loc} ->
    begin
      match type_decl with
      (*Case where the type identifer is [t]*)
      | {ptype_name = {txt = "t"; _};
         ptype_kind = Ptype_variant constructor_declarations;
         ptype_attributes;
         _} ->
        begin
          match List.filter (fun ({txt;_},_) ->txt="id_of")
            ptype_attributes
          with
          (*No [@@id_of]*)
          | [] -> default_mapper.structure_item mapper item :: acc

          (*At least one [@@id_of] (treat multiple occurences as if
            one)*)
          | _ ->
            (*Cases of an [id_of] function for [t], one for each
              of its constructors*)
            let cases=
              List.fold_right
                (fun x acc ->
                  case_of_constructor_declaration x :: acc)
                constructor_declarations [] in
            (*The [id_of] function itself*)
            let id_of : structure_item =
              Str.value Nonrecursive [
                Vb.mk
                  (Pat.var {txt="id_of"; loc=(!default_loc)})
                  (Exp.function_ cases)] in

            default_mapper.structure_item mapper item :: id_of :: acc
        end
      (*Case the type identifier is something other than [t]*)
      | _ -> default_mapper.structure_item mapper item :: acc
    end
  (*Case this structure item is something other than a single type
    declaration*)
  | _ -> default_mapper.structure_item mapper item :: acc

Finally we can write structure_mapper itself as a simple fold over a structure.

let structure_mapper
    (mapper : mapper)
    (structure : structure) : structure =
  List.fold_right (eval_structure_item mapper)structure []

Building and testing

So that's it, this preprocessor extension is complete. Assuming the code is contained in a file called ppx_id_of.ml it can be compiled with a command along the lines of the following.

ocamlc -o ppx_id_of.exe  -I +compiler-libs ocamlcommon.cma ppx_id_of.ml
When built, it can be tested with a command like ocamlc -dsource -ppx ppx_id_of.exe test.ml.

For example, when invoked on the following program,

type t =
  | A [@id 2]
  | B of int [@id 4] [@@id_of]

module M = struct
  type t =
  | Foo of int [@id 42]
  | Bar [@id 43] [@@id_of]

  module N = struct
     type t =
     | Baz [@id 8]
     | Quux of string * int [@id 7] [@@id_of]

    module Q = struct
      type t =
        | U [@id 0] [@@id_of]
    end
  end
end
the resulting output is,
type t =
  | A
  | B of int
let id_of = function | A  -> 2 | B _ -> 4
module M =
  struct
    type t =
      | Foo of int
      | Bar
    let id_of = function | Foo _ -> 42 | Bar  -> 43
    module N =
      struct
        type t =
          | Baz
          | Quux of string * int
        let id_of = function | Baz  -> 8 | Quux _ -> 7
        module Q = struct type t =
                            | U
                          let id_of = function | U  -> 0  end
      end
  end


References:
[1] "A Guide to Extension Points in OCaml" -- Whitequark (blog post 2014)