Wednesday, October 12, 2016

Monty Hall

Suppose you're on a game show, and you're given the choice of three doors : Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No.3 which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

What do you think?

This problem is known as the "Monty Hall" problem. It's named after the host of the American television game show "Let's make a deal".

Paul Erdős, one of the most prolific mathematicians in history remained unconvinced (of the correct answer to the above problem) until he was shown a computer simulation confirming the predicted result.

Here's a simulation in OCaml one hopes, may have convinced Paul!

module Monty = struct

  (*[dtr w p n] where [n] is the number of doors, selects which door
    to remain (closed) given the winning door [w] and the player
    chosen door [p]*)
  let rec dtr w p n =
    if p <> w then  w 
    else if p = 0 then n - 1 else 0

  (*[gen_game d] generates a game with [d] doors and returns a game
    with a winning door, a player selected door and a door to keep
    closed before if the player wants to switch*)
  let gen_game (d : int) : (int * int * int) =
    let w = d and p = d in 
    (w, p, dtr w p d)

  let num_wins = ref 0 (*To keep track of scores*)
  type strategy = Hold | Switch (*The type of strategies*)

  (*Play a single game*)
  let play_game (d : int) (s : strategy) : unit =
    let w, p, r = gen_game d in
    match s with
    | Hold → num_wins := !num_wins + if p = w then 1 else 0
    | Switch → num_wins := !num_wins + if r = w then 1 else 0

  (*Play a set of [n] games*)
  let play_games (d : int) (n : int) (s : strategy ) : unit = 
    let rec loop (i : int) : unit = 
      if i = n then ()
      else  begin
        play_game d s;
        loop (i + 1)
    in loop 0


open Monty

(*Initialized from the command line*)
let version       = ref false
let num_doors     = ref 0
let num_sims      = ref 0
let read_args () =
  let specification =
    [("-v", Arg.Set version, "Print the version number");
     ("-d", Arg.Set_int num_doors, "Number of doors (>= 3)" );
     ("-n", Arg.Set_int num_sims, "Number of simulations (>= 1)");
  in Arg.parse specification
  (fun s →
    Printf.printf "Warning : Ignoring unrecognized argument \"%s\"\n" s)
  "Usage : monty -d <number of doors> -n <number of simulations>"

(*[fabs e] computes the absolute value of [e]*)
let fabs (e : float) : float = if e < 0. then ~-.e else e

let () = 
    read_args ();
    if !version then Printf.printf "1.0.0\n"
      let n = !num_sims and d = !num_doors in
      if d < 3 then
        raise (Invalid_argument "Number of doors must be >= than 3");
      if n < 1 then
        raise (Invalid_argument "Number of simulations must be >= 1");
        num_wins := 0;
        play_games d n Hold;
        Printf.printf "Num wins (hold): %d\n" !num_wins;
        let err=fabs (float_of_int (!num_wins) /. 
                    (float_of_int n) -. 1.0 /. (float_of_int d)) in
        Printf.printf "Error %f\n" err;
        num_wins := 0;
        play_games d n Switch;
        Printf.printf "Num wins (switch): %d\n" !num_wins;
        let err=fabs (float_of_int (!num_wins) /. 
                   (float_of_int n) -. (float_of_int (d - 1) /. 
                                                (float_of_int d))) in
        Printf.printf "Error %f\n" err ;

  | Invalid_argument s → Printf.printf "%s\n" s

Wednesday, October 5, 2016

Conversion operations of the lambda-calculus



This note provides a super lightweight explanation of the three conversion operations of the $\lambda$-calculus known (cryptically) as $\alpha$, $\beta$ and $\eta$ conversions respectively (borrowed fairly freely from the delightful reference given at the bottom.)

Syntax vs. semantics

The $\textbf{syntax}$ of the language of $\textit{$\lambda$- expressions}$ is \[ \begin{eqnarray} <exp> & ::= & <constant>\;\; & \text{Constants} \nonumber \\ & \mid & <variable>\;\; & \text{Variables} \nonumber \\ & \mid & <exp> <exp>\;\; & \text{Applications} \nonumber \\ & \mid & \lambda<variable>.<exp>\;\; & \text{Abstractions} \nonumber \end{eqnarray} \] The $\textbf{semantics}$ of the the $\lambda$-calculus is defined by three $\textit{conversion rules}$. To understand them requires the terminology of $\textit{free}$ and $\textit{bound}$ variables. An occurence of a variable in a $\lambda$-expression is bound if there is an enclosing abstraction that binds it, and is free otherwise. For example, in $\lambda x.+\; ((\lambda y. +\;y\; z)\;7)\;x$, $x$ and $y$ appear bound whereas $z$ appears free.


$\beta$-reduction describes how to apply a function to an argument. For example, $\left(\lambda x.+\;x\;1\right)\; 4$ denotes the application of a particular $\lambda$-abstraction to the argument $4$. The result of applying a $\lambda$-abstraction to an argument is an instance of the body of the $\lambda$-abstraction in which (free) occurences of the formal parameter in the body are replaced with (copies of) the argument. Thus, $\left(\lambda x.+\;x\;1\right)\; 4 \rightarrow +\;4\;1 \rightarrow 5$. In the event there are no occurences of the formal parameter in the abstraction body, the argument is discarded unused so, $(\lambda x.\;3)\;4 \rightarrow 3$. Care is needed when formal parameter names are not unique. For example, \[ \begin{eqnarray} & & \left(\lambda x.\;\left(\lambda x.+ \left(-\;x\;1\right)\right)\;x\;3\right)\; 9 \nonumber \\ & \rightarrow & \left(\lambda x.+ \left(-\;x\;1\right)\right)\;9\;3 \nonumber \\ & \rightarrow & +\;\left(-\;9\;1\right)\;3 \nonumber \\ & \rightarrow & +\;8\;3 \nonumber \\ & \rightarrow & 11 \nonumber \end{eqnarray} \] The key point of that example is that we did not substitue for the inner $x$ in the first reduction because it was not free in the body of the outer $\lambda x$ abstraction. Indeed, in the OCaml top-level we observe

       # (fun x -> (fun x -> ( + ) (( - ) x 1)) x 3) 9 ;;
       - : int = 11
or equivalently, in C++,
      auto add = [](int x) { return [=](int y) { return x + y; }; };
      auto sub = [](int x) { return [=](int y) { return x - y; }; };
      [=](int x) { 
        return [=](int x) {
          return add (sub (x) (1)); 
          } (x) (3); 
      } (9) ; //is the value '11'
The $\beta$-rule applied backwards is called $\beta$-abstraction and written with a backwards reduction arrow '$\leftarrow$'. Thus, $+\;4\;1 \leftarrow (\lambda x.\;+\;1\;x)\;4$. $\beta$-conversion means reduction or abstraction and is written with a double-ended arrow augmented with a $\beta$ (in order to distinguish it from other forms of conversion). So, $+\;4\;1 \underset{\beta}{\leftrightarrow} (\lambda x.\;+\;1\;x)\;4$. One way to look at $\beta$ conversion is that it is saying something about $\lambda$-expressions that look different but mean the same thing.


It seems obvious that the two abstractions $\lambda x.+\;x\;1$ and $\lambda y.+\;y\;1$ "ought" to be equivalent. $\alpha$-conversion allows us to change the name of a formal parameter as long as it is done consistently. So we write $\lambda x.+\;x\;1 \underset{\alpha}{\leftrightarrow} \lambda y.+\;y\;1$. Of course, the newly introduced name must not occur free in the body of the original $\lambda$-abstraction.


This last conversion rule exists to to complete our intuition about what $\lambda$-abstractions "ought" to be equivalent. The rule is this : If $f$ denotes a function, $x$ a variable that does not occur free in $f$, then $\lambda x.f\;x \underset{\eta}{\leftrightarrow} f$. For example, in OCaml if we define f by let f x = x + 1 then clearly fun x -> f x produces the same results for all values x in the domain of f.


The first section provides a set of formal rules for constructing expressions (the BNF grammar). Using the notation $E\;\left[M/x\right]$ to mean the expression $E$ with $M$ substituted for free occurrences of $x$ we can succintly state the the rules for converting one expression into an equivalent one as \[ \begin{eqnarray} x\;\left[M/x\right] & = & M \nonumber \\ c\;\left[M/x\right] & = & c\;\;\text{where $c$ is any variable or constant other than $x$} \nonumber \\ \left(E\;F\right)\;\left[M/x\right] & = & E\left[M/x\right]\; F\left[M/x\right]\; \nonumber \\ \left(\lambda x.E\right)\;\left[M/x\right] & = & \lambda x.E \nonumber \\ \left(\lambda y.E\right)\;\left[M/x\right] & & \text{where $y$ is any variable other than $x$} \nonumber \\ & = & \lambda y.E\left[M/x\right]\;\text{if $x$ does not occur free in E or $y$ does not occur free in $M$} \nonumber \\ & = & \lambda z.\left(E\left[z/y\right]\right)\left[M/x\right]\;\text{otherwise}\nonumber \\ \end{eqnarray} \]

[1] The Implementation of Functional Programming Languages by Simon L. Peyton Jones. 1987.

Tuesday, October 4, 2016

Eliminating left-recursion

Eliminating left recursion

Consider a non-terminal $A$ with two productions $A \rightarrow A \alpha \mid \beta$ where $\alpha$, $\beta$ are sequences of terminals and non-terminals that do not start with $A$. It produces strings of the form $\beta\alpha^{*}$. This rule for $A$ exhibits direct left recursion. The left recursion can be turned into right recursion by rewriting in terms of a new non-terminal $R$ as $A \rightarrow \beta R \nonumber$ and $R \rightarrow \alpha R \mid \epsilon$.

Generally, immediate left recursion can be eliminated by the following technique which works for any number of $A$ productions. First group as \[ \begin{equation} A \rightarrow A\alpha_{1} \mid A\alpha_{2} \mid \cdots \mid A\alpha_{m} \mid \beta_{1} \mid \cdots \mid \beta_{n} \end{equation} \] where no $\beta_{i}$ begins with an $A$. Then replace the $A$ productions by \[ \begin{eqnarray} A &\rightarrow& \beta_{1}A^{\prime} \mid \beta_{2}A^{\prime} \mid \cdots \mid \beta_{n}A^{\prime} \nonumber \\ A^{\prime} &\rightarrow& \alpha_{1}A^{\prime} \mid \alpha_{2}A^{\prime} \mid \cdots \mid \alpha_{m}A^{\prime} \mid \epsilon \nonumber \end{eqnarray} \] This procedure eliminates all direct left recursion from the $A$ and $A^{\prime}$ rules (provided no $\alpha_{i}$ is $\epsilon$). For example, the language of arithmetic expressions might be written \[ \begin{eqnarray} & E &\rightarrow E + T \mid E - T \mid T \nonumber \\ & T &\rightarrow T * F \mid T \mathbin{/} F \mid F \nonumber \\ & F &\rightarrow \left( E \right) \mid \mathbf{id} \nonumber \end{eqnarray} \] which, on applying the above procedure yields \[ \begin{eqnarray} & E &\rightarrow T\;E^{\prime} \nonumber \\ & E^{\prime} &\rightarrow +\;T\;E^{\prime} \mid -\;T\;E^{\prime}\nonumber \mid \epsilon \nonumber \\ & T &\rightarrow F\;T^{\prime} \nonumber \\ & T^{\prime} &\rightarrow *\;F\;T^{\prime} \mid \mathbin{/}\;F\;T^{\prime} \mid \epsilon \nonumber \\ & F &\rightarrow \left( E \right) \mid \mathbf{id} \nonumber \end{eqnarray} \] Consider the grammar \[ \begin{eqnarray} S &\rightarrow& A\;a \mid b \nonumber \\ A &\rightarrow& A\;c \mid S\;d \mid \epsilon \nonumber \end{eqnarray} \] The non-terminal $S$ is left recursive because $S \Rightarrow A\;a \Rightarrow S\;d\;a$ but it is not immediately left recursive. The procedure given above does not eliminate left recursion of this kind. It is however amenable to the following approach. First order the non-terminals $S$ then $A$. We'd start by eliminating direct left recursion from the $S$-productions but there is none among the $S$-productions so we move onto the $A$-productions. First we eliminate $S$ from the $A$-productions by substitution to obtain the following $A$-productions \[ \begin{equation} A \rightarrow A\;c \mid A\;a\;d \mid b\;d \mid \epsilon \nonumber \end{equation} \] and now eliminate the direct left recursion in the $A$ to get \[ \begin{eqnarray} & S & \rightarrow A\;a \mid b \nonumber \\ & A & \rightarrow b\;d\;A^{\prime} \mid A^{\prime} \nonumber \\ & A^{\prime} & \rightarrow c\;A^{\prime} \mid a\;d\;A^{\prime} \mid \epsilon \nonumber \end{eqnarray} \] Technically, the above approach is only guaranteed to work when the grammar to which it is applied has no cycles or $\epsilon$-productions. The above example violates this in that the rule for $A$ contained an $\epsilon$-production but it turns out in this case to be harmless. Generalizing, assuming an input grammar $G$ with no cycles or $\epsilon$-productions, an equivalent grammar with no left recursion can be found by, arranging the nonterminals of $G$, $A_{1}, A_{2}, \dots, A_{n}$ say, then visiting each in order, for each $A_{i}$, replace each production of the form $A_{i} \rightarrow A_{j}\gamma$ by the productions $A_{i} \rightarrow \delta_{1}\gamma \mid \delta_{2}\gamma \mid \cdots \mid \delta_{k}\gamma$ where $A_{j} \rightarrow \delta_{1} \mid \delta_{2} \mid \cdots \mid \delta_{k}$ are all the current $A_{j}$ productions, $j < i$ and then elminiate the immediate left recursion among the $A_{i}$ productions.

One of the pre-conditions of the algorithm of the previous section is that the input grammar $G$ contain no $\epsilon$-productions. So, we seek a method for eliminating $\epsilon$-productions where we can. To begin, we define a non-terminal $A$ of a grammar $G$ $\textit{nullable}$ if, $A \overset{*}{\Rightarrow} \epsilon$. A non-terminal is nullable if, in $G$, $A \rightarrow \epsilon$ or if $A \rightarrow A_{1}A_{2} \cdots A_{k}$ and each $A_{i}$ is nullable. To illustrate the procedure, let $G$ be given as: \[ \begin{eqnarray} S &\rightarrow& A\;B \nonumber \\ A &\rightarrow& A\;a\;A \mid \epsilon \nonumber \\ B &\rightarrow& B\;b\;B \mid \epsilon \nonumber \end{eqnarray} \] In this grammar all of $S$, $A$ and $B$ are nullable. The new grammar introduces a new start rule $S^{\prime} \rightarrow S$ and since $S$ is nullable we also add an $\epsilon$ alternative to conclude $S^{\prime} \rightarrow S \mid \epsilon$. Now, for each rule $A \rightarrow X_{1} X_{2} \dots X_{k}$ create rules, $A \rightarrow \alpha_{1}\alpha_{2}\cdots\alpha_{k}$ where \[ \begin{equation} \alpha_{i} = \begin{cases} X_{i} & \text{if $X_{i}$ is a terminal/non-nullable non-terminal} \\ X_{i}\; \text{or}\;\epsilon & \text{if $X_{i}$ is nullable} \end{cases} \end{equation} \] and not all $\alpha_{i}$ are nullable. Applying this procedure then, we get \[ \begin{eqnarray} & S^{\prime} &\rightarrow S \mid \epsilon \nonumber \\ & S &\rightarrow A\;B \mid A \mid B \nonumber \\ & A &\rightarrow A\;a\;A \mid a\;A \mid A\; a \mid a \nonumber \\ & B &\rightarrow B\;b\;B \mid b\;B \mid B\; b \mid b \nonumber \end{eqnarray} \] The net effect is that $\epsilon$-productions have been eliminated but for the $S^{\prime}$ production which does not appear on the right-hand-side of any other rule.

[1] Compilers Principles, Techniques, & Tools by Aho et. al. 2nd Ed. 2007.

Tuesday, September 27, 2016

The fixpoint combinator

Consider the following recursive definition of the factorial function. \[ FAC = \lambda n.\;IF \left(=\;n\;0\right)\;1\;\left(*\;n\;\left(FAC\;\left(-\;n\;1\right)\right)\right) \nonumber \] The definition relies on the ability to name a $\lambda$-abstraction and then to refer to this name inside the $\lambda$-abstraction itself. No such facility is provided by the $\lambda$-calculus. $\beta$-abstraction is applying $\beta$-reduction backwards to introduce new $\lambda$-abstractions, thus $+\;4\;1\leftarrow \left(\lambda x.\;+\;x\;1\right)\; 4$. By $\beta$-abstraction on $FAC$, its definition can be written \[ FAC = \left(\lambda fac.\;\left(\lambda n.\;IF\left(=\;n\;0\right)\;1\;\left(*\;n\;\left(fac\;\left(-\;n\;1\right)\right)\right)\right)\right) FAC \nonumber \] This definition has taken the form $FAC = g\;FAC$ where $g = \left(\lambda fac.\;\left(\lambda n.\;IF\left(=\;n\;0\right)\;1\;\left(*\;n\;\left(fac\;\left(-\;n\;1\right)\right)\right)\right)\right)$ is without recursion. We see also that $FAC$ is a fixed point ("fixpoint") of $g$. It is clear this fixed point can only depend on $g$ so supposing there were a function $Y$ which takes a function and delivers a fixpoint of the function as the result, we'd have $FAC = Y\;g = g\;(Y\;g)$. Under the assumption such a function exists, in order to build confidence this definition of $FAC$ works, we will try to compute $FAC\;1$. Recall \[ \begin{eqnarray} &FAC& = Y\;g \nonumber \\ &g& = \lambda fac.\;\left(\lambda n.\;IF\left(=\;n\;0\right)\;1\;\left(*\;n\;\left(fac\;\left(-\;n\;1\right)\right)\right)\right) \nonumber \end{eqnarray} \] So, \[ \begin{eqnarray} FAC\;1 &\rightarrow& (Y\;g)\; 1 \nonumber \\ &\rightarrow& (g\;(Y\;g))\;1 \nonumber \\ &\rightarrow& (\left(\lambda fac.\;\left(\lambda n.\;IF\left(=\;n\;0\right)\;1\;\left(*\;n\;\left(fac\;\left(-\;n\;1\right)\right)\right)\right)\right) (Y\;g))\; 1 \nonumber \\ &\rightarrow& \left(\lambda n.\;IF\left(=\;n\;0\right)\;1\;\left(*\;n\;\left(\left(Y\;g\right)\;\left(-\;n\;1\right)\right)\right)\right)\; 1 \nonumber \\ &\rightarrow& *\;1\;\left(\left(Y\;g\right)\;0\right) \nonumber \\ &\rightarrow& *\;1\;\left(\left(g\;\left(Y\;g\right)\right)\;0\right) \nonumber \\ &\rightarrow& *\;1\;\left(\left(\left(\lambda fac.\;\left(\lambda n.\;IF\left(=\;n\;0\right)\;1\;\left(*\;n\;\left(fac\;\left(-\;n\;1\right)\right)\right)\right)\right)\;\left(Y\;g\right)\right)\;0\right) \nonumber \\ &\rightarrow& *\;1\;\left(\left(\lambda n.\;IF\left(=\;n\;0\right)\;1\;\left(*\;n\;\left(\left(Y\;g\right)\;\left(-\;n\;1\right)\right)\right)\right)\;0\right) \nonumber \\ &\rightarrow& *\;1\;1 \nonumber \\ &=& 1 \nonumber \end{eqnarray} \]

The $Y$ combinator of the $\lambda$-calculus is defined as the $\lambda$-term $Y = \lambda f.\;\left(\lambda x.\;f\;\left(x\;x\right)\right)\left(\lambda x.\;f\;\left(x\;x\right)\right)$. $\beta$ reduction of this term applied to an arbitrary function $g$ proceeds like this: \[ \begin{eqnarray} Y\;g &\rightarrow& \left(\lambda f.\;\left(\lambda x.\;f\;\left(x\;x\right)\right) \left(\lambda x.\;f\;\left(x\;x\right)\right)\right)\;g \nonumber \\ &\rightarrow& \left(\lambda x.\;g\;\left(x\;x\right)\right) \left(\lambda x.\;g\;\left(x\;x\right)\right) \nonumber \\ &\rightarrow& g\;\left(\left(\lambda x.\;g\;\left(x\;x\right)\right)\;\left(\lambda x.\;g\;\left(x\;x\right)\right)\right) \nonumber \\ &=& g\;\left(Y\;g\right) \end{eqnarray} \] The application of this term has produced a fixpoint of $g$. That is, we are satisfied that this term will serve as a definition for $Y$ having the property we need and call it the "fixpoint combinator".

In the untyped $\lambda$-calculus, $Y$ can be defined and that is sufficient for expressing all the functions that can be computed without having to add a special construction to get recursive functions. In typed $\lambda$-calculus, $Y$ cannot be defined as the term $\lambda x.\;f\;(x\;x)$ does not have a finite type. Thus, when implementing recursion in a functional programming language it is usual to implement $Y$ as a built-in function with the reduction rule $Y\;g \rightarrow g\;(Y\;g)$ or, in a strict language, $(Y\; g)\;x \rightarrow (g\;(Y\;g))\;x$ to avoid infinite recursion.

For an OCaml like language, the idea then is to introduce a built-in constant $\mathbf{Y}$ and to denote the function defined by $\mathbf{let\;rec}\;f\;x = e$ as $\mathbf{Y}(\mathbf{fun}\;f\;x \rightarrow e)$. Intuitivly, $\mathbf{Y}$ is a fixpoint operator that associates a functional $F$ of type $\left(\alpha \rightarrow \beta\right) \rightarrow \alpha \rightarrow \beta$ with a fixpoint of type $\alpha \rightarrow \beta$, that is, a value having the property $\mathbf{Y}\;F = F\;\left(\mathbf{Y}\;F\right)$. The relevant deduction rules involving this constant are: \[ \begin{equation} \frac{\vdash f\;(Y\;f)\;x \Rightarrow v} {\vdash (Y\;f)\;x \Rightarrow v} \tag{App-rec} \end{equation} \] \[ \begin{equation} \frac{\vdash e_{2}\left[Y(\mathbf{fun}\;f\;x \rightarrow e_{1})/f\right] \Rightarrow v} {\vdash \mathbf{let\;rec}\;f\;x=e_{1}\;\mathbf{in}\;e_{2} \Rightarrow v} \nonumber \tag {Let-rec} \end{equation} \]

[1] The Implementation of Functional Programming Languages,Simon Peyton Jones, 1987.
[2] The Functional Approach to Programming, Guy Cousineau, Michel Mauny, 1998.

Tuesday, September 20, 2016

Custom operators in OCaml

If like me, you've always been a little hazy on the rules for defining OCaml operators then, this little post might help!

It is possible to "inject" user-defined operator syntax into OCaml programs. Here's how it works. First we define a set of characters called "symbol characters".

Symbol character (definition)

A character that is one of

! $ % & * + - . / : < = > ? @ ^ | ~

Prefix operators

The ! ("bang") prefix operator, has a predefined semantic as the operation of "de-referencing" a reference cell. A custom prefix operator can made by from a ! followed by one or more symbol characters.

So, to give some examples, one can define prefix operators like !!, !~ or even something as exotic as !::>. For example, one might write something like

let ( !+ ) x : int ref → unit = incr x
as a syntactic sugar equivalent to fun x → incr x

Additionally, prefix operators can begin with one of ~ and ? and, as in the case of !, must be followed by one or more symbol characters. So, in summary, a prefix operator begins with one of

! ~ ?
and is followed by one or more symbol characters.

For example let ( ~! ) x = incr x defines an alternative syntax equivalent to the !+ operator presented earlier.

Prefix operators have the highest possible precedence.

Infix operators

It is in fact possible to define operators in 5 different categories. What distinguish these categories from each other are their associativity and precedence properties.

Level 0

Level 0 operators are left associative with the same precedence as =. A level 0 operator starts with one of

= < > | & $
and is followed by zero or more symbol chars. For example, >>= is an operator much beloved by monadic programmers and |> (pipe operator) is a builtin equivalent to let ( |> ) x f = f x.

Level 1

Level 1 operators are right associative, have a precedence just above = and start with one of

@ ^
. That is, these operators are consistent with operations involving joining things. @@ (the "command" operator) of course has a predefined semantic as function application, that is, equivalent to the definition let ( @@ ) f x = f x.

Level 2

Level 2 operators are left associative have a precedence level shared with + and - and indeed, are defined with a leading (one of)

+ -
and, as usual, followed by a sequence of symbol characters. These operators are consistent for usage with operations generalizing addition or difference like operations. Some potential operators of this kind are +~, ++ and so on.

Level 3

Level 3 operators are also left associative and have a precedence level shared with * and /. Operators of this kind start with one of

* / %
followed by zero or more symbol characters and are evocative of operations akin to multiplication, division. For example, *~ might make a good companion for +~ of the previous section.

Level 4

Level 4 operators are right associative and have a precedence above *. The level 4 operators begin with

and are followed by zero or more symbol characters. The operation associated with ** is exponentiation (binds tight and associates to the right). The syntax **~ would fit nicely into the +~, *~ set of the earlier sections.

Saturday, August 27, 2016

Perfectly balanced binary search trees

The type of "association tables" (binary search trees).

type (α, β) t =
| Empty
| Node of (α , β) t * α * β * (α, β) t * int
There are two cases : a tree that is empty or, a node consisting of a left sub-tree, a key, the value associated with that key, a right sub-tree and, an integer representing the "height" of the tree (the number of nodes to traverse before reaching the most distant leaf).

The binary search tree invariant will be made to apply in that for any non empty tree $n$, every node in the left sub-tree is ordered less than $n$ and every node in the right sub-tree of $n$ is ordered greater than $n$ (in this program, ordering of keys is performed using the function).

This function, height, given a tree, extracts its height.

let height : (α, β) t -> int = function
  | Empty -> 0
  | Node (_, _, _, _, h) -> h

The value empty, is a constant, the empty tree.

let empty : (α, β) t = Empty

create l x d r creates a new non-empty tree with left sub-tree l, right sub-tree r and the binding of key x to the data d. The height of the tree created is computed from the heights of the two sub-trees.

let create (l : (α, β) t) (x : α) (d : β) (r : (α, β) t) : (α, β) t =
  let hl = height l and hr = height r in
  Node (l, x, d, r, (max hl hr) + 1)

This next function, balance is where all the action is at. Like the preceding function create, it is a factory function for interior nodes and so takes the same argument list as create. It has an additional duty though in that the tree that it produces takes balancing into consideration.

let balance (l : (α, β) t) (x : α) (d : β) (r : (α, β) t) : (α, β) t =
  let hl = height l and hr = height r in
  if hl > hr + 1 then
    match l with
In this branch of the program, it has determined that production of a node with the given left and right sub-trees (denoted $l$ and $r$ respectively) would be unbalanced because $h(l) > hr(1) + 1$ (where $h$ denotes the height function).

There are two possible reasons to account for this. They are considered in turn.

    (*Case 1*)
    | Node (ll, lv, ld, lr, _) when height ll >= height lr ->
      create ll lv ld (create lr x d r)
So here, we find that $h(l) > h(r) + 1$, because of the height of the left sub-tree of $l$.
    (*Case 2*)
    | Node (ll, lv, ld, Node (lrl, lrv, lrd, lrr, _), _) ->
      create (create ll lv ld lrl) lrv lrd (create lrr x d r)
In this case, $h(l) > h(r) + 1$ because of the height of the right sub-tree of $l$.
    | _ -> assert false
We assert false for all other patterns as we aim to admit by construction no further possibilities.

We now consider the case $h(r) > h(l) + 1$, that is the right sub-tree being "too long".

  else if hr > hl + 1 then
    match r with

There are two possible reasons.

    (*Case 3*)
    | Node (rl, rv, rd, rr, _) when height rr >= height rl ->
      create (create l x d rl) rv rd rr
Here $h(r) > h(l) + 1$ because of the right sub-tree of $r$.
    (*Case 4*)
    | Node (Node (rll, rlv, rld, rlr, _), rv, rd, rr, _) ->
      create (create l x d rll) rlv rld (create rlr rv rd rr)
Lastly, $h(r) > h(l) + 1$ because of the left sub-tree of $r$.
    | _ -> assert false
Again, all other patterns are (if we write this program correctly according to our intentions,) impossible and so, assert false as there are no further possibilities.

In the last case, neither $h(l) > h(r) + 1$ or $h(r) > h(l) + 1$ so no rotation is required.

    create l x d r

add x data t computes a new tree from t containing a binding of x to data. It resembles standard insertion into a binary search tree except that it propagates rotations through the tree to maintain balance after the insertion.

let rec add (x : α) (data : β) : (α, β) t -> (α, β) t = function
    | Empty -> Node (Empty, x, data, Empty, 1)
    | Node (l, v, d, r, h) ->
      let c = compare x v in
      if c = 0 then
        Node (l, x, data, r, h)
      else if c < 0 then
        balance (add x data l) v d r
        balance l v d (add x data r)

To implement removal of nodes from a tree, we'll find ourselves needing a function to "merge" two binary searchtrees $l$ and $r$ say where we can assume that all the elements of $l$ are ordered before the elements of $r$.

let rec merge (l : (α, β) t) (r : (α, β) t) : (α, β) t = 
  match (l, r) with
  | Empty, t -> t
  | t, Empty -> t
  | Node (l1, v1, d1, r1, h1), Node (l2, v2, d2, r2, h2) ->
    balance l1 v1 d1 (balance (merge r1 l2) v2 d2 r2)
Again, rotations are propagated through the tree to ensure the result of the merge results in a perfectly balanced tree.

With merge available, implementing remove becomes tractable.

let remove (id : α) (t : (α, β) t) : (α, β) t = 
  let rec remove_rec = function
    | Empty -> Empty
    | Node (l, k, d, r, _) ->
      let c = compare id k in
      if c = 0 then merge l r else
        if c < 0 then balance (remove_rec l) k d r
        else balance l k d (remove_rec r) in
  remove_rec t

The remaining algorithms below are "stock" algorithms for binary search trees with no particular consideration of balancing necessary and so we won't dwell on them here.

let rec find (x : α) : (α, β) t -> β = function
  | Empty ->  raise Not_found
  | Node (l, v, d, r, _) ->
    let c = compare x v in
    if c = 0 then d
    else find x (if c < 0 then l else r)

let rec mem (x : α) : (α, β) t -> bool = function
  | Empty -> false
  | Node (l, v, d, r, _) ->
    let c = compare x v in
    c = 0 || mem x (if c < 0 then l else r)
let rec iter (f : α -> β -> unit) : (α, β) t -> unit = function
  | Empty -> ()
  | Node (l, v, d, r, _) ->
    iter f l; f v d; iter f r

let rec map (f : α -> β -> γ) : (α, β) t -> (α, γ) t = function
  | Empty -> Empty
  | Node (l, k, d, r, h) -> 
    Node (map f l, k, f k d, map f r, h)

let rec fold (f : α -> β -> γ -> γ) (m : (α, β) t) (acc : γ) : γ =
  match m with
  | Empty -> acc
  | Node (l, k, d, r, _) -> fold f r (f k d (fold f l acc))

open Format

let print 
    (print_key : formatter -> α -> unit)
    (print_data : formatter -> β -> unit)
    (ppf : formatter)
    (tbl : (α, β) t) : unit =
  let print_tbl ppf tbl =
    iter (fun k d -> 
           fprintf ppf "@[<2>%a ->@ %a;@]@ " print_key k print_data d)
      tbl in
  fprintf ppf "@[[[%a]]@]" print_tbl tbl

The source code for this post can be found in the file 'ocaml/misc/' in the OCaml source distribution. More information on balanced binary search trees including similar but different implementation techniques and complexity analyses can be found in this Cornell lecture and this one.

Friday, August 19, 2016

Even Sillier C++

The C++ try...catch construct provides a facility for discrimination of exceptions based on their types. This is a primitive "match" construct. It turns out, this is enough to encode sum types.

The program to follow uses the above idea to implement an interpreter for the language of additive expressions using exception handling for case discrimination.

#include <iostream>
#include <cassert>
#include <exception>
#include <memory>

struct expr {
  virtual ~expr() {}

  virtual void throw_ () const = 0;

using expr_ptr = std::shared_ptr<expr const>;

class expr is an abstract base class, class int_ and class add derived classes corresponding to the two cases of expressions. Sub-expressions are represented as std::shared_ptr<expr> instances.

struct int_ : expr { 
  int val; 
  int_ (int val) : val{val}

  void throw_ () const { throw *this; } 

struct add : expr { 
  expr_ptr left; 
  expr_ptr right; 

  template <class U, class V>
  add (U const& left, V const& right) 
    : left {expr_ptr{new U{left}}}
    , right {expr_ptr{new V{right}}}

  void throw_ () const { throw *this; } 

With the above machinery in place, here then is the "interpreter". It is implemented as a pair of mutually recursive functions.

int eval_rec ();

int eval (expr const& xpr) {
  try {
    xpr.throw_ ();
  catch (...) {
    return eval_rec ();

int eval_rec () {
  assert (std::current_exception());

  try {
  catch (int_ const& i) {
    return i.val;
  catch (add const& op) {
    return eval (*op.left) +  eval (*op.right);

This little program exercises the interpreter on the expression $(1 + 2) + 3$.

int main () {

    // (1 + 2) + 3
    std::cout << eval (add{add{int_{1}, int_{2}}, int_{3}}) << std::endl;
  catch (...){
    std::cerr << "Unhandled exception\n";
  return 0;

Credit to Mathias Gaunard who pointed out using a virtual function for the throwing of an expression, removed the need for explicit dynamic_cast operations in an earlier version of this program.