Cosineau and Mauny present a type suitable for modeling terms with variables convenient for applications involving variables and substitutions. It is defined like this:
(*term_types.ml*)
type (α, β) term =
| Term of α * (α, β) term list
| Var of β
Analyzing syntax
To proceed, let us restrict our attention to the type
(string, string) term
. The concrete syntax we use for such expressions is parenthesized.
- a (t1, ..., tn) is the syntax of
Term (a, [t1; ...; tn])
We disambiguate variables from constants by representing constants as functions applied to an empty list of arguments
- a (b (), c) is the syntax of
Term ("a", [Term ("b", []); Var "c"])
We start towards a function to analyze syntax. First the lexer.
(*lexer.mll*)
{
open Parser (*The type token is defined in parser.mli.*)
}
let alpha=['a'-'z' 'A'-'Z']
rule token = parse
[' ' '\t' '\n'] (* skip blanks *) { token lexbuf }
| '(' { LPAREN }
| ')' { RPAREN }
| ',' { COMMA }
| ((alpha)(alpha)*) as s { STRING s}
| eof { EOI }
Now the parser.
/*parser.mly*/
%token STRING
%token LPAREN RPAREN COMMA EOI
%start main
%type <(string, string) Term_types.term> main
%%
main:
| exp EOI { $1 }
;
exp:
| tag exp_more
{
match $2 with
| None -> Term_types.Var $1
| Some l -> Term_types.Term ($1, l)
}
;
exp_more:
| /*nothing*/ { None }
| LPAREN exp_list RPAREN { Some $2 }
;
exp_list:
| /*nothing*/ { [] }
| exp exp_list_more { $1::$2 }
;
exp_list_more:
| /*nothing*/ {[]}
| COMMA exp exp_list_more { $2::$3 }
;
tag:
| STRING { $1 }
;
With these pieces in place we can write the analyzer
term_of_string
like so
(*term.ml*)
let term_of_string s =
let parse_buf lexbuf =
try Parser.main Lexer.token lexbuf
with
| Parsing.Parse_error ->
begin
let curr = lexbuf.Lexing.lex_curr_p in
let line = curr.Lexing.pos_lnum in
let cnum = curr.Lexing.pos_cnum - curr.Lexing.pos_bol in
let tok = Lexing.lexeme lexbuf in
raise
(Failure
(Printf.sprintf
"file \"\", line %d, character %d\n\Error : Syntax error \"%s\"" line cnum tok))
end
in parse_buf (Lexing.from_string s)
Reverse parsing is done with
string_of_term
:
let rec string_of_term (t:(string, string)term) : string =
let string_of_list (f:α -> string) (l:α list) : string =
"(" ^ String.concat "," (List.map f l) ^ ")" in
match t with
| Term (s, tl) -> Printf.sprintf "%s" (s^(string_of_list string_of_term tl))
| Var s -> s
A simple function for printing a term then goes like this:
let print_term (t:(string, string) term) : unit = Printf.printf "%s" (string_of_term t)
For example, this program
let tt = term_of_string "a(b(), c)" in print_term tt
prints "a(b(), c)" recovering the input as we'd hope.
Traversal
The function
term_trav
is a workhorse for doing computations over terms. It generalizes all functions defined by recursion on the structure of a term.
(* [term_trav] is a generalization of all functions defined by
recusion on the structure of an [(α, β) term] - a homomorphism.
The definition of [term_trav] below gives the following signature.
{[
val term_trav : (α * β -> γ) ->
(γ -> β -> β) -> β -> (δ -> γ) -> (α, δ) term -> γ
]}
Proof:
Assume [term_trav] to operate on arguments of type [(α, δ)
term]. It "returns" either via [f] or [v], and so [f], [v] and
[term_trav] must share the same return type, [γ] say.
Since the intermediate list results from recursive calls to
[term_trav] then it must have type [γ list]. This means [g] must
be of type [γ -> β -> β] for some type [β] which fixes [x] to
[β] whilst completing [f] as [α * β -> γ].
Lastly [v] takes arguments of type [δ] meaning it types to [δ ->
γ].
*)
let rec term_trav f g x v = function
| Term (a, tl) ->
let l = List.map (term_trav f g x v) tl in
let res = List.fold_right g l x in
f (a, res)
| Var b -> v b
For a trivial example, we can express a function to compute the "size" of a term this way:
let term_size (t:(α, β) term) : int =
(term_trav (fun (_, s) -> s + 1) (fun x y -> x + y) 0 (fun _ -> 1))
t
I appreciate there's an efficiency gain to be had by eliminating the intermediate list in
term_trav
but the resulting function definition is more challenging to reason about (well, at least for me).