Friday, October 27, 2017

Nesting quoted strings in OCaml

Quoting

According to the lexical conventions of OCaml, characters different from \ and " can be enclosed in single quotes and appear in strings. The special characters \ and " are represented in these contexts by their escape sequences. The escape sequence \\ denotes the character \ and \" denotes the character ".

Here we print the string "Hello world!". The quotes delimit the string and are not themselves part of the string.

utop[0]> Caml.Printf.printf "Hello world!";;
Hello world!- : unit = ()

To capture the quotes we need to write them into the string by their escape sequence.

utop[1]> Caml.Printf.printf "\"Hello world!\"";;
"Hello world!"- : unit = ()

What now if we wish to quote a string within a string?

utop[3]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested quoted string\\\"\"";;
"A quoted string with \"a nested quoted
string\""- : unit = ()

We see that in rendering the above string, printf has rendered the escape sequence \" as " and \\\" as \" as required. The pattern continues if we now wish to quote a string within a quoted string within a quoted string.

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()

As you can see, things get crazy pretty quickly and you can easily drive yourself mad working out the correct escape sequences to get the desired nesting!

Here's a hack : If the string has k levels of quoting, then count how many occurences of \s precede the " at that level. Let that number be n say. To get the next level of quoting you need to concatenate a sequence of n + 1 \s to them to get a total of 2n + 1 \s. To illustrate, look again at the last example:

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()
That's three level of quoting. At the third level we have the sequence \\\\\\\". That's 7 \s. To quote to the fourth level then we need 8 + 7 = 15 \s:
utop[5]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested
\\\\\\\\\\\\\\\"nested\\\\\\\\\\\\\\\" \\\\\\\" quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested
\\\\\\\"nested\\\\\\\" \\\" quoted string\""- : unit = ()

In general, the number of \s required for n levels of quoting is 2n - 1 (that is, an exponential function). The solution follows from the recurrence relation Q0 = 0 and Qn = 2Qn - 1 + 1 which in fact establishes a connection to the "Towers of Hanoi" problem.