Thursday, May 4, 2017

Preprocessor extensions for code generation


Preprocessor extensions for code generation

"A Guide to Extension Points in OCaml"[1] provides a great "quick-start" on using the OCaml extension points API to implement preprocessor extensions for abstract syntax tree rewrites. This post picks up where that tutorial leaves off by showing how to write a ppx that does code generation.

The problem treated here is one posed in Whitequark's blog : "Implement a syntax extension that would accept type declarations of the form type t = A [@id 1] | B of int [@id 4] [@@id_of] to generate a function mapping a value of type t to its integer representation."

Implementing the "id_of" ppx

The basic strategy

In the OCaml parse tree, structures are lists of structure items. Type declarations are structure items as are let-bindings to functions.

In this program, analysis of an inductive type declaration t may result in the production of a new structure item, the AST of an of_id function to be appended to the structure containing t.

Now the general strategy in writing a ppx is to provide a record of type Ast_mapper.mapper. That record is based on the Ast_mapper.default_mapper record but selectively overriding those fields for those sytactic categories that the ppx is intending to transform.

Now, as we determined above, the effect of the ppx is to provide a function from a structure to a new structure. Accordingly, at a minimum then we'll want to override the structure field of the default mapper. Schematically then our ppx code will take on the following shape.

  open Ast_mapper
  open Ast_helper
  open Asttypes
  open Parsetree
  open Longident

  let structure_mapper mapper structure =

   let id_of_mapper =  {
     default_mapper with structure = structure_mapper

   let () = register "id_of" id_of_mapper

This program goes just a little bit further though. Any @id or @@id_of attributes that get as far as the OCaml compiler would be ignored. So, it's not neccessary that they be removed by our ppx once they've been acted upon but it seems tidy to do so. Accordingly, there are two more syntactic constructs that this ppx operates on.

  open Ast_mapper
  open Ast_helper
  open Asttypes
  open Parsetree
  open Longident

  let structure_mapper mapper structure =

  let type_declaration_mapper mapper decl =

  let constructor_declaration_mapper mapper decl =

  let id_of_mapper argv = {
    default_mapper with
      structure = structure_mapper;
      type_declaration = type_declaration_mapper;
      constructor_declaration = constructor_declaration_mapper

Implementing the mappings

To warm up, lets start with the easy mappers.

The role of type_declaration_mapper is a function from a type_declaration argument to a type_declaration result that is the argument in all but that any @@id_of attribute has been removed.

  let type_declaration_mapper 
      (mapper : mapper) 
      (decl : type_declaration) : type_declaration  =
    match decl with
      (*Case of an inductive type "t"*)
    | {ptype_name = {txt = "t"; _};
       ptype_kind = Ptype_variant constructor_declarations;
       ptype_attributes;_} ->
      let (_, attrs) = 
        List.partition (fun ({txt;_},_) ->txt="id_of") ptype_attributes in
      {(default_mapper.type_declaration mapper decl) 
      with ptype_attributes=attrs}
    (*Not an inductive type named "t"*)
    | _ -> default_mapper.type_declaration mapper decl

constructor_declaration_mapper is analogous to type_declaration_mapper above but this time its @id attributes that are removed.

  let constructor_declaration_mapper 
      (mapper : mapper) 
      (decl : constructor_declaration) : constructor_declaration =
    match decl with
    | {pcd_attributes; _} ->
      let (_, attrs) = 
        List.partition (fun ({txt;_}, _) -> txt="id") pcd_attributes  in
      {(default_mapper.constructor_declaration mapper decl) 
      with pcd_attributes=attrs}

Now to the raison d'etre of the ppx, structure_mapper.

First, a utility function that computes from a constructor_declaration with an @id attribute, a (function) case for it. For example, suppose "Bar of int [@id 4]" is the constructor declaration, then the case to be computed is the AST corresponding to the code "| Bar _ -> 4".

  let case_of_constructor_declaration : 
      constructor_declaration -> case =  function
    | {pcd_name={txt;loc};pcd_args;pcd_attributes; _} ->
      match List.filter (fun ({txt;_}, _) -> txt="id") pcd_attributes with
      (*No "@id"*)
      | [] -> 
        raise (Location.Error (Location.error ~loc "[@id] : Missing"))
      (*Single "@id"*)
      | [(_, payload)] -> 
        begin match payload with 
          | PStr [{pstr_desc=Pstr_eval ({pexp_desc=
              Pexp_constant (Pconst_integer (id, None)); _}, _)
            }] ->
                 {txt=Lident txt; loc=(!default_loc)}  
                 (match pcd_args with 
                 | Pcstr_tuple [] -> None | _ -> Some (Pat.any ())))
              (Exp.constant (Pconst_integer (id, None)))
          | _ -> 
            raise (Location.Error (Location.error ~loc 
            "[@id] : Bad (or missing) argument (should be int e.g. [@id 4])"))
      (*Many "@id"s*)
      | (_ :: _) -> 
        raise (Location.Error (Location.error ~loc 
        "[@id] : Multiple occurences"))

One more utility function is required.

eval_structure_item item acc computes structure items to push on the front of acc. If item is a single declaration of an inductive type t attributed with @@id_of, then two structure items will be produced : one for t and one synthesized for t's of_id function. In all other cases, just one structure item will be pushed onto acc.

  let eval_structure_item
      (mapper : mapper) 
      (item : structure_item) 
      (acc : structure) : structure =
    match item with
    (*Case of a single inductive type declaration*)
    | { pstr_desc = Pstr_type (_, [type_decl]); pstr_loc} ->
        match type_decl with
        (*Case where the type identifer is [t]*)
        | {ptype_name = {txt = "t"; _};
           ptype_kind = Ptype_variant constructor_declarations;
           _} ->
            match List.filter (fun ({txt;_},_) ->txt="id_of") 
            (*No [@@id_of]*)
            | [] -> default_mapper.structure_item mapper item :: acc
            (*At least one [@@id_of] (treat multiple occurences as if
            | _ ->
              (*Cases of an [id_of] function for [t], one for each
                of its constructors*)
              let cases= 
                  (fun x acc -> 
                    case_of_constructor_declaration x :: acc) 
                  constructor_declarations [] in
              (*The [id_of] function itself*)
              let id_of : structure_item = 
                Str.value Nonrecursive [
                    (Pat.var {txt="id_of"; loc=(!default_loc)}) 
                    (Exp.function_ cases)] in
              default_mapper.structure_item mapper item :: id_of :: acc
        (*Case the type identifier is something other than [t]*)
        | _ -> default_mapper.structure_item mapper item :: acc
    (*Case this structure item is something other than a single type
    | _ -> default_mapper.structure_item mapper item :: acc

Finally we can write structure_mapper itself as a simple fold over a structure.

  let structure_mapper 
      (mapper : mapper) 
      (structure : structure) : structure =
    List.fold_right (eval_structure_item mapper) structure []

Building and testing

So that's it, this preprocessor extension is complete. Assuming the code is contained in a file called it can be compiled with a command along the lines of the following.

  ocamlc -o ppx_id_of.exe  -I +compiler-libs ocamlcommon.cma
When built, it can be tested with a command like ocamlc -dsource -ppx ppx_id_of.exe

For example, when invoked on the following program,

  type t = 
    | A [@id 2] 
    | B of int [@id 4] [@@id_of]

  module M = struct
    type t =
    | Foo of int [@id 42]
    | Bar [@id 43] [@@id_of]

    module N = struct
       type t = 
       | Baz [@id 8] 
       | Quux of string * int [@id 7] [@@id_of]

      module Q = struct
        type t = 
          | U [@id 0] [@@id_of]
the resulting output is,
  type t =
    | A
    | B of int
  let id_of = function | A  -> 2 | B _ -> 4
  module M =
      type t =
        | Foo of int
        | Bar
      let id_of = function | Foo _ -> 42 | Bar  -> 43
      module N =
          type t =
            | Baz
            | Quux of string * int
          let id_of = function | Baz  -> 8 | Quux _ -> 7
          module Q = struct type t =
                              | U
                            let id_of = function | U  -> 0  end

[1] "A Guide to Extension Points in OCaml" -- Whitequark (blog post 2014)