Sunday, June 10, 2018

Bucket Sort

Bucket Sort

Bucket sort assumes input generated by a random process that distributes elements uniformly over the interval [0, 1).

The idea of bucket sort is to divide [0, 1) into n equal-sized subintervals or buckets, and then distribute the n input numbers into the buckets. To produce the output, sort the numbers in each bucket and then go through the buckets in order. Sorting a bucket can be done with insertion sort.

let rec insert x = function
  | [] -> [x]
  | h :: tl as ls ->
    if x < h then x :: ls else h :: insert x tl

let rec insertion_sort = function
  | [] | [_] as ls -> ls
  | h :: tl -> insert h (insertion_sort tl)

This code for bucket sort assumes the input is an n-element array a and that each element 0 ≤ a.(i) < 1. The code requires an auxillary array b.(0 .. n - 1) of lists (buckets).

let bucket_sort a =
  let n = Array.length a in
  let b = Array.make n [] in
    (fun x ->
       let i =
         int_of_float (
           floor (float_of_int n *. x)
         ) in
        Array.set b i (x :: Array.get b i)
      ) a;
    (fun i l ->
       Array.set b i (insertion_sort l)
    ) b;
  Array.fold_left (fun acc bucket -> acc @ bucket) [] b
bucket_sort [| 0.78; 0.17; 0.39; 0.26; 0.72; 0.94
             ; 0.21; 0.12; 0.23; 0.68|]
Bucket sort runs in linear time on the average.

[1] "Introduction to Algorithms" Section 9.4:Bucket Sort -- Cormen et. al. (Second ed.) 2001.

Sunday, May 20, 2018

Dijkstra's algorithm

Shortest Path

This article assumes familiarity with Dijkstra's shortest path algorithm. For a refresher, see [1]. The code assumes open Core is in effect and is online here.

The first part of the program organizes our thoughts about what we are setting out to compute. The signature summarizes the notion (for our purposes) of a graph definition in modular form. A module implementing this signature defines a type vertex_t for vertices, a type t for graphs and type extern_t : a representation of a t for interaction between an implemening module and its "outside world".

module type Graph_sig = sig
  type vertex_t [@@deriving sexp]
  type t [@@deriving sexp]
  type extern_t

  type load_error = [ `Duplicate_vertex of vertex_t ] [@@deriving sexp]
  exception Load_error of load_error [@@deriving sexp]

  val of_adjacency : extern_t -> [ `Ok of t | `Load_error of load_error ]
  val to_adjacency : t -> extern_t

  module Dijkstra : sig
    type state

    type error = [
      | `Relax of vertex_t
    ] [@@deriving sexp]
    exception Error of error [@@deriving sexp]

    val dijkstra : vertex_t -> t -> [ `Ok of state | `Error of error ]
    val d : state -> (vertex_t * float) list
    val shortest_paths : state -> (vertex_t * vertex_t list) list

A realization of Graph_sig provides "conversion" functions of_adjacency/to_adjacency between the types extern_t and t and nests a module Dijkstra. The signature of the sub-module Dijkstra requires concrete modules provide a type state and an implementation of Dijkstra's algorithm in terms of the function signature val dijkstra : vertex_t -> t -> [ `Ok of state | `Error of error ].

For reusability, the strategy for implementing graphs will be generic programming via functors over modules implementing s vertex type.

An implementation of the module type GRAPH defines a module type VERT which is required to provide a comparable type t. It further defines a module type S that is exactly module type Graph_sig above. Lastly, modules of type GRAPH provide a functor Make that maps any module of type VERT to new module of type S fixing extern_t to an adjacency list representation in terms of the native OCaml type 'a list and float to represent weights on edges.

module type GRAPH = sig
  module type VERT = sig
    type t[@@deriving sexp]
    include Comparable.S with type t := t

  module type S = sig
    include Graph_sig

  module Make : functor (V : VERT) ->
    S with type vertex_t = V.t
       and type extern_t = (V.t * (V.t * float) list) list
The two module types Graph_sig and GRAPH together provide the specification for the program. module Graph in the next section implements this specification.

Implementation of module Graph is in outline this.

module Graph : GRAPH = struct
  module type VERT = sig
    type t[@@deriving sexp]
    include Comparable.S with type t := t

  module type S = sig
    include Graph_sig

  module Make : functor (V : VERT) ->
    S with type vertex_t = V.t
       and type extern_t = (V.t * (V.t * float) list) list

    functor (V : VERT) -> struct
As per the requirements of GRAPH the module types VERT and S are provided as is the functor Make. It is the code that is ellided by the ... above in the definition of Make that is now the focus.

Modules produced by applications of Make satisfy S. This requires suitable definitions of types vertext_t, t and extern_t. The modules Map and Set are available due to modules of type VERT being comparable in their type t.

      module Map = V.Map
      module Set = V.Set

      type vertex_t = V.t [@@deriving sexp]
      type t = (vertex_t * float) list Map.t [@@deriving sexp]
      type extern_t = (vertex_t * (vertex_t * float) list) list
      type load_error = [ `Duplicate_vertex of vertex_t ] [@@deriving sexp]
      exception Load_error of load_error [@@deriving sexp]

While the external representation extern_t of graphs is chosen to be an adjacency list representation in terms of association lists, the internal representation t is a vertex map of adjacency lists providing logarithmic loookup complexity. The conversion functions between the two representations "come for free" via module Map.

      let to_adjacency g = Map.to_alist g

      let of_adjacency_exn l =  match Map.of_alist l with
        | `Ok t -> t
        | `Duplicate_key c -> raise (Load_error (`Duplicate_vertex c))

      let of_adjacency l =
          `Ok (of_adjacency_exn l)
        | Load_error err -> `Load_error err

At this point the "scaffolding" for Dijkstra's algorithm, that part of GRAPH dealing with the representation of graphs is implemented.

The interpretation of Dijkstra's algorithm we adopt is functional : the idea is we loop over vertices relaxing their edges until all shortest paths are known. What we know on any recursive iteration of the loop is a current "state" (of the computation) and each iteration produces a new state. This next definition is the formal definition of type state.

      module Dijkstra = struct

        type state = {
          src    :                  vertex_t
        ; g      :                         t
        ; d      :               float Map.t
        ; pred   :            vertex_t Map.t
        ; s      :                     Set.t
        ; v_s    : (vertex_t * float) Heap.t
The fields of this record are:
  • src : vertex_t, the source vertex;
  • g : t, G the graph;
  • d : float Map.t, d the shortest path weight estimates;
  • pre : vertex_t Map.t, π the predecessor relation;
  • s : Set.t, the set S of nodes for which the lower bound shortest path weight is known;
  • v_s : (vertex_t * float) Heap.t, V - {S}, , the set of nodes of g for which the lower bound of the shortest path weight is not yet known ordered on their estimates.

Function invocation init src g compuates an initial state for the graph g containing the source node src. In the initial state, d is everywhere except for src which is 0. Set S (i.e. s) and the predecessor relation π (i.e. pred) are empty and the set V - {S} (i.e. v_s) contains all nodes.

        let init src g =
          let init x = match V.equal src x with
            | true -> 0.0 | false -> Float.infinity in
          let d = List.fold (Map.keys g) ~init:Map.empty
              ~f:(fun acc x -> Map.set acc ~key:x ~data:(init x)) in
          ; g
          ; s = Set.empty
          ; d
          ; pred = Map.empty
          ; v_s = Heap.of_list (Map.to_alist d)
                ~cmp:(fun (_, e1) (_, e2) -> e1 e2)

Relaxing an edge (u, v) with weight w (u, v) tests whether the shortest path to v so far can be improved by going through u and if so, updating d (v) and π (v) accordingly.

        type error = [
          | `Relax of vertex_t
        ] [@@deriving sexp]
        exception Error of error [@@deriving sexp]

        let relax state (u, v, w) =
          let {d; pred; v_s; _} = state in
          let dv = match Map.find d v with
            | Some dv -> dv
            | None -> raise (Error (`Relax v)) in
          let du = match Map.find d u with
            | Some du -> du
            | None -> raise (Error (`Relax u)) in
          if dv > du +. w then
            let dv = du +. w in
            (match Heap.find_elt v_s ~f:(fun (n, _) -> V.equal n v) with
            | Some tok -> ignore (Heap.update v_s tok (v, dv))
            | None -> raise (Error (`Relax v))
            { state with
              d = Map.change d v
                      | Some _ -> Some dv
                      | None -> raise (Error (`Relax v))
            ; pred = Map.set (Map.remove pred v) ~key:v ~data:u
          else state
Here, relaxation can result in a linear heap update operation. A better implementation might seek to avoid that.

One iteration of the body of the loop of Dijkstra's algorithm consists of the node in V - {S} with the least shortest path weight estimate being moved to S and its edges relaxed.

        let dijkstra_exn src g =
          let rec loop ({s; v_s; _} as state) =
            match Heap.is_empty v_s with
            | true -> state
            | false ->
              let u = fst (Heap.pop_exn v_s) in
              loop (
                List.fold (Map.find_exn g u)
                  ~init:{ state with s = Set.add s u }
                  ~f:(fun state (v, w) -> relax state (u, v, w))
          in loop (init src g)

        let dijkstra src g =
            `Ok (dijkstra_exn src g)
          | Error err -> `Error err

The shortest path estimates contained by a value of state is given by the projection d.

        let d state = Map.to_alist (state.d)

The shortest paths themselves are easily computed as,

   let path state n =
          let rec loop acc x =
            (match V.equal x state.src with
            | true -> x :: acc
            | false -> loop (x :: acc) (Map.find_exn state.pred x)
            ) in
          loop [] n

        let shortest_paths state =
 (Map.keys state.g) ~f:(fun n -> (n, path state n))
which completes the implementation of Make.

The following program produces a concrete instance of the shortest path problem (with some evaluation output from the top-level).

module G : Graph.S with
  type vertex_t = char and type extern_t = (char * (char * float) list) list
  Graph.Make (Char)

let g : G.t =
  match G.of_adjacency
          [ 's', ['u',  3.0; 'x', 5.0]
          ; 'u', ['x',  2.0; 'v', 6.0]
          ; 'x', ['v',  4.0; 'y', 6.0; 'u', 1.0]
          ; 'v', ['y',  2.0]
          ; 'y', ['v',  7.0]
  | `Ok g -> g
  | `Load_error e -> failwiths "Graph load error : %s" e G.sexp_of_load_error
let s = match (G.Dijkstra.dijkstra 's' g) with
  | `Ok s -> s
  | `Error e -> failwiths "Error : %s" e G.Dijkstra.sexp_of_error

;; G.Dijkstra.d s
- : (char * float) list =
[('s', 0.); ('u', 3.); ('v', 9.); ('x', 5.); ('y', 11.)]

;; G.Dijkstra.shortest_paths s
- : (char * char list) list =
[('s', ['s']); ('u', ['s'; 'u']); ('v', ['s'; 'u'; 'v']); ('x', ['s'; 'x']);
 ('y', ['s'; 'x'; 'y'])]

[1] "Introduction to Algorithms" Section 24.3:Dijkstra's algorithm -- Cormen et. al. (Second ed.) 2001.

Saturday, December 9, 2017

How to migrate your ppx to OCaml migrate parsetree

OCaml migrate parse tree

OCaml migrate parse tree

Earlier this year, this blog post [2] explored the implementation of a small preprocessor extension (ppx).

The code of the above article worked well enough at the time but as written, exhibits a problem : new releases of the OCaml compiler are generally accompanied by evolutions of the OCaml parse tree. The effect of this is, a ppx written against a specific version of the compiler will "break" in the presence of later releases of the compiler. As pointed out in [3], the use of ppx's in the OCaml eco-system these days is ubiquitous. If each new release of the OCaml compiler required sychronized updates of each and every ppx in opam, getting new releases of the compiler out would soon become a near impossibilty.

Mitigation of the above problem is provided by the ocaml-migrate-parsetree library. The library provides the means to convert parsetrees from one OCaml version to another. This allows the ppx rewriter to write against a specific version of the parsetree and lets the library take care of rolling parsetrees backwards and forwards in versions as necessary. In this way, the resulting ppx is "forward compatible" with newer OCaml versions without requiring ppx code updates.

To get the ppx_id_of code from the earlier blog post usable with ocaml-migrate-parsetree required a couple of small tweaks to make it OCaml 4.02.0 compatible. The changes from the original code were slight and not of significant enough interest to be worth presenting here. What is worth looking at is what it then took to switch the code to use ocaml-migrate-parsetree. The answer is : very little!

open Migrate_parsetree
open OCaml_402.Ast

open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident

(* The original ppx as written before goes here!
   .                    .                   .
   .                    .                   .
   .                    .                   .

let () = Driver.register ~name:"id_of" (module OCaml_402) id_of_mapper
The complete code for this article is available online here and as a bonus, includes a minimal jbuilder build system demonstrating just how well the OCaml tool-chain comes together these days.

[1] "A Guide to Extension Points in OCaml" -- Whitequark (blog post 2014)
[2] "Preprocessor extensions for code generation" -- Shayne Fletcher (blog post 2017)
[3] "Extension Points - 3 Years Later" -- Rudi Grinberg (blog post 2017)

Saturday, November 11, 2017

Towers of Hanoi

Towers of Hanoi

The "towers of Hanoi" problem is stated like this. There are three pegs labelled a, b and c. On peg a there is a stack of n disks of increasing size, the largest at the bottom, each with a hole in the middle to accomodate the peg. The problem is to transfer the stack of disks to peg c, one disk at a time, in such a way as to ensure that no disk is ever placed on top of a smaller disk.

The problem is amenable to a divide and conquer strategy : "Move the top n - 1 disks from peg a to peg b, move the remaining largest disk from peg a to peg c then, move the n - 1 disks on peg b to peg c."

let rec towers n from to_ spare =
  if n > 0 then
      towers (n - 1) from spare to_;
               "Move the top disk from peg %c to peg %c\n" from to_;
      towers (n - 1) spare to_ from
For example, the invocation let () = towers 3 'a' 'c' 'b' will generate the recipie
Move the top disk from peg a to peg c
Move the top disk from peg a to peg b
Move the top disk from peg c to peg b
Move the top disk from peg a to peg c
Move the top disk from peg b to peg a
Move the top disk from peg b to peg c
Move the top disk from peg a to peg c

Let T(n) be the time complexity of towers (x, y, z), when the characteristic operation is the moving of a disk from one peg to another. The time complexity of towers(n - 1, x, y z) is T(n - 1) by definition and no further investigation is needed. T(0) = 0 because the test n > 0 fails and no disks are moved. For larger n, the expression towers (n - 1, from, spare, to_) is evaluated with cost T(n - 1) followed by Printf.printf "Move the top disk from peg %c to peg %c\n" from to_ with cost 1 and finally, towers(n - 1, spare, to_, from) again with cost T(n - 1).

Summing these contributions gives the recurrence relation T(n) = 2T(n - 1) + 1 where T(0) = 0.

Repeated substituition can be used to arrive at a closed form for T(n), since, T(n) = 2T(n - 1) + 1 = 2[2T(n - 2) + 1] + 1 = 2[2[2T(n - 3) +1] + 1] + 1 = 23T(n - 3) + 22 + 21 + 20 (provided n ≥ 3), expanding the brackets in a way that elucidates the emerging pattern. If this substitution is repeated i times then clearly the result is T(n) = 2iT(n - i) + 2i - 1 + 2i - 2 + ··· + 20 (n ≥ i). The largest possible value i can take is n and if i = n then T(n - i) = T(0) = 0 and so we arrive at T(n) = 2n0 + 2n - 1 + ··· + 20. This is the sum of a geometric series with the well known solution 2n - 1 (use induction to establish that last result or more directly, just compute 2T(n) - T(n)). And so, the time complexity (the number of disk moves needed) for n disks is T(n) = 2n - 1.

Algorithms and Data Structures Design, Correctness, Analysis by Jeffrey Kingston, 2nd ed. 1998

Friday, October 27, 2017

Nesting quoted strings in OCaml


According to the lexical conventions of OCaml, characters different from \ and " can be enclosed in single quotes and appear in strings. The special characters \ and " are represented in these contexts by their escape sequences. The escape sequence \\ denotes the character \ and \" denotes the character ".

Here we print the string "Hello world!". The quotes delimit the string and are not themselves part of the string.

utop[0]> Caml.Printf.printf "Hello world!";;
Hello world!- : unit = ()

To capture the quotes we need to write them into the string by their escape sequence.

utop[1]> Caml.Printf.printf "\"Hello world!\"";;
"Hello world!"- : unit = ()

What now if we wish to quote a string within a string?

utop[3]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested quoted string\\\"\"";;
"A quoted string with \"a nested quoted
string\""- : unit = ()

We see that in rendering the above string, printf has rendered the escape sequence \" as " and \\\" as \" as required. The pattern continues if we now wish to quote a string within a quoted string within a quoted string.

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()

As you can see, things get crazy pretty quickly and you can easily drive yourself mad working out the correct escape sequences to get the desired nesting!

Here's a hack : If the string has k levels of quoting, then count how many occurences of \s precede the " at that level. Let that number be n say. To get the next level of quoting you need to concatenate a sequence of n + 1 \s to them to get a total of 2n + 1 \s. To illustrate, look again at the last example:

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()
That's three level of quoting. At the third level we have the sequence \\\\\\\". That's 7 \s. To quote to the fourth level then we need 8 + 7 = 15 \s:
utop[5]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested
\\\\\\\\\\\\\\\"nested\\\\\\\\\\\\\\\" \\\\\\\" quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested
\\\\\\\"nested\\\\\\\" \\\" quoted string\""- : unit = ()

In general, the number of \s required for n levels of quoting is 2n - 1 (that is, an exponential function). The solution follows from the recurrence relation Q0 = 0 and Qn = 2Qn - 1 + 1 which in fact establishes a connection to the "Towers of Hanoi" problem.

Saturday, October 14, 2017

How to render trees like the Unix tree command

How to render trees like Unix 'tree'

The Unix tree utility produces a pretty rendering of a filesystem. Implementing an algorithm to produce output like tree is a little harder than one might expect! This short example program illustrates one way of doing it.

(* A type of non-empty trees of strings. *)
type tree = [
  |`Node of string * tree list

(* [print_tree tree] prints a rendering of [tree]. *)
let rec print_tree
          ?(pad : (string * string)= ("", ""))
          (tree : tree) : unit =
  let pd, pc = pad in
  match tree with
  | `Node (tag, cs) ->
     Printf.printf "%s%s\n" pd tag;
     let n = List.length cs - 1 in
     List.iteri (
         fun i c ->
         let pad =
           (pc ^ (if i = n then "`-- " else "|-- "),
            pc ^ (if i = n then "    " else "|   ")) in
         print_tree ~pad c
       ) cs

(* An example tree. *)
let tree =
  `Node ("."
        , [
            `Node ("S", [
                      `Node ("T", [
                                `Node ("U", [])]);
                      `Node ("V", [])])
          ;  `Node ("W", [])

(* Print the example tree. *)
let () =  print_tree tree

The output of the above looks like this:

|-- S
|   |-- T
|   |   `-- U
|   `-- V
`-- W

Saturday, August 12, 2017



If we are to represent a row of a matrix as a list of numbers, then a matrix can naturally be represented as a list of lists of numbers.

The transpose of a matrix $\mathbf{A}$ is a new matrix denoted $\mathbf{A^{T}}$. The traditional mathematical definition of $\mathbf{A^{T}}$ is expressed as saying the $i$ th row, $j$ th column element of $\mathbf{A^{T}}$ is the $j$ th row, $i$ th column element of $\mathbf{A}$:

$\left[\mathbf{A}\right]_{ij} = \left[\mathbf{A^{T}}\right]_{ji}$.

As definitions go, this isn't terribly helpful in explaining how to compute a transpose. A better equivalent definition for the functional programmer is : the matrix obtained by writing the columns of $\mathbf{A}$ as the rows of $\mathbf{A^{T}}$.

An elegant program for computing a transpose follows from a direct translation of that last definition.

let rec transpose (ls : 'a list list) : 'a list list =
  match ls with
  | [] | [] :: _ -> []
  | ls -> (List.hd) ls :: transpose ( ( ls)

It is not at all hard to understand how the program works when you've seen an example:

transpose [[1; 2]; [3; 4;]; [5; 6]]
  = [1; 3; 5] :: transpose [[2]; [4;]; [6]]
  = [1; 3; 5] :: [2; 4; 6] :: transpose [[]; []; []]
  = [1; 3; 5] :: [2; 4; 6] :: []
  = [[1; 3; 5]; [2; 4; 6]]

Being as pretty as it is, one is inclined to leave things be but, as a practical matter it should be rephrased to be tail-recursive.

let rec transpose (ls : 'a list list) : 'a list list  =
  let rec transpose_rec acc = function
  | [] | [] :: _ -> List.rev acc
  | ls -> transpose_rec ( (List.hd) ls :: acc) ( ( ls)
  in transpose_rec [] ls

"An Introduction to Functional Programming Systems Using Haskell" -- Davie A J T., 1992