Friday, October 27, 2017

Nesting quoted strings in OCaml

Quoting

According to the lexical conventions of OCaml, characters different from \ and " can be enclosed in single quotes and appear in strings. The special characters \ and " are represented in these contexts by their escape sequences. The escape sequence \\ denotes the character \ and \" denotes the character ".

Here we print the string "Hello world!". The quotes delimit the string and are not themselves part of the string.

utop[0]> Caml.Printf.printf "Hello world!";;
Hello world!- : unit = ()

To capture the quotes we need to write them into the string by their escape sequence.

utop[1]> Caml.Printf.printf "\"Hello world!\"";;
"Hello world!"- : unit = ()

What now if we wish to quote a string within a string?

utop[3]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested quoted string\\\"\"";;
"A quoted string with \"a nested quoted
string\""- : unit = ()

We see that in rendering the above string, printf has rendered the escape sequence \" as " and \\\" as \" as required. The pattern continues if we now wish to quote a string within a quoted string within a quoted string.

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()

As you can see, things get crazy pretty quickly and you can easily drive yourself mad working out the correct escape sequences to get the desired nesting!

Here's a hack : If the string has k levels of quoting, then count how many occurences of \s precede the " at that level. Let that number be n say. To get the next level of quoting you need to concatenate a sequence of n + 1 \s to them to get a total of 2n + 1 \s. To illustrate, look again at the last example:

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()
That's three level of quoting. At the third level we have the sequence \\\\\\\". That's 7 \s. To quote to the fourth level then we need 8 + 7 = 15 \s:
utop[5]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested
\\\\\\\\\\\\\\\"nested\\\\\\\\\\\\\\\" \\\\\\\" quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested
\\\\\\\"nested\\\\\\\" \\\" quoted string\""- : unit = ()

In general, the number of \s required for n levels of quoting is 2n - 1 (that is, an exponential function). The solution follows from the recurrence relation Q0 = 0 and Qn = 2Qn - 1 + 1 which in fact establishes a connection to the "Towers of Hanoi" problem.


Saturday, October 14, 2017

How to render trees like the Unix tree command

How to render trees like Unix 'tree'

The Unix tree utility produces a pretty rendering of a filesystem. Implementing an algorithm to produce output like tree is a little harder than one might expect! This short example program illustrates one way of doing it.

(* A type of non-empty trees of strings. *)
type tree = [
  |`Node of string * tree list
]
;;

(* [print_tree tree] prints a rendering of [tree]. *)
let rec print_tree
          ?(pad : (string * string)= ("", ""))
          (tree : tree) : unit =
  let pd, pc = pad in
  match tree with
  | `Node (tag, cs) ->
     Printf.printf "%s%s\n" pd tag;
     let n = List.length cs - 1 in
     List.iteri (
         fun i c ->
         let pad =
           (pc ^ (if i = n then "`-- " else "|-- "),
            pc ^ (if i = n then "    " else "|   ")) in
         print_tree ~pad c
       ) cs
;;

(* An example tree. *)
let tree =
  `Node ("."
        , [
            `Node ("S", [
                      `Node ("T", [
                                `Node ("U", [])]);
                      `Node ("V", [])])
          ;  `Node ("W", [])
          ])
;;

(* Print the example tree. *)
let () =  print_tree tree
;;

The output of the above looks like this:

.
|-- S
|   |-- T
|   |   `-- U
|   `-- V
`-- W