Sunday, October 19, 2014

Tail-recursion

Tail-recursion

Stack overflow refers to a condition in the execution of a computer program whereby the stack pointer exceeds the address space allocated for the stack. The usual result of "blowing the stack" is swift and brutal abnormal termination of the program.

The amount of memory allocated by the operating system for a given program's stack is finite and generally little can be done by the programmer to influence the amount that will be made available. The best the programmer can really do is to use what's given wisely.

We can get a sense of the limits of the stack in practical terms with a program like the following.

let rec range s e = 
  if s >= e then [] 
  else s :: range (s + 1) e

let rec loop i =
  let n = 2.0 ** (i |> float_of_int) |> int_of_float in
  try
    let _ = range 0 n in
    loop (i + 1)
  with
  | Stack_overflow -> 
    Printf.printf "Stack overflow at i = %d, n = %d\n" i n
    
let () = loop 0
range is a function that produces the half-open range $\left[s, e\right)$ - the ordered sequence $\left\{s, s + 1, s + 2, \dots, e - 2, e - 1\right\}$. Note that range is defined in terms of itself, that is, it is a recursive function. The idea is to use it in an unbounded loop to build sequences of increasing lengths of powers of $2$ : ${2^0, 2^1, 2^2, \dots}$. We set it off and when we encounter stack overflow, terminate the program gracefully reporting on the power of $2$ found to give rise to the condition. In my case I found that I was able to make $\approx 2^{19} = 524,288$ recursive calls to range before the stack limit was breached. That's very limiting. For realistic programs, one would hope to be able to produce sequences of lengths much greater than that!

What can be done? The answer lies in the definition of range and that thing called tail-recursion. Specifically, range is not tail-recursive. To be a tail-recursive function, the last thing the function needs do is to make a recursive call to itself. That's not the case for range as written as the recursive call to itself is the second to last thing it does before it returns (the last thing it does is to 'cons' a value onto the list returned by the recursive call).

Why being tail-recursive is helpful is that tail-calls can be implemented by the compiler without requiring the addition of a new "stack frame" to the stack. Instead, the current frame can be replaced in setting up the tail-call being modified as necessary and effectively the recursive call is made to be a simple jump. This is called tail-call elimination and its effect is to allow tail-recursive functions to circumvent stack overflow conditions.

Here's a new definition for range, this time implemented with tail-calls.
let range s e = 
  let rec aux acc s e = 
    if s >= e then acc
  else aux (s :: acc) (s + 1) e
  in List.rev (aux [] s e)
With this definition for range I find I can build sequences of length up to around $\approx 2^{26} = 67,108,864$ elements long without any sign of stack overflow which is a huge improvement! At around this point though, my sequence building capabilities start to be limited by the amount of physical memory present on my PC but that's a different story entirely.

Sunday, October 12, 2014

Dimensional analysis (and the units of the universal gas constant 'R')

Dimensional analysis

The problem at hand is to find by dimensional analysis, the SI units of the universal gas constant $R$ (forgive me - whilst this entry is not explicitly about computer programming - it is in fact one of my daughter's homework problems - the obvious relationship to type systems makes it seem to me at least tangentially relevant).

$R$ is defined by the Ideal Gas Law: $PV = nRT$ were $P$ is the absolute pressure of the gas, $V$ is the volume of the gas, $n$ is the amount of substance of gas (measured in moles), and $T$ is the absolute temperature of the gas.

The obvious dimensions are as follows :

$\left[P\right]$: $M \cdot L^{-1}\cdot T^{-2}$, $\left[V\right]$: $L^3$ and $\left[T\right]$: $\Theta$.

Now, one mole of a substance is defined to be $6.0221367\times 10^{23}$ atoms of that substance (Avogardro's number) but even dimensionless numbers can be part of a dimensioned system. The trick is to realize that if one quantity in an equation is "per mole" then so too must be any quantity added to it. Accordingly, if we define a (pseudo) dimension $\Lambda$ for the amount $n$ we can reason that $\left[R\right]$: $M \cdot L^{2} \cdot T^{-2} \cdot \Theta^{-1} \cdot \Lambda^{-1}$. This is enough for us to say the fundamental units for $R$ are

$kg \cdot m^{2} \cdot s^{-2} \cdot K^{-1} \cdot mol^{-1}$.

We can go a little further though. Since $work = force \times length$ we see that $M \cdot L \cdot T^{-2}$ can be expressed in units of energy and indeed $1J = kg \cdot m^{2} \cdot s^{-2}$. Thus we arrive at our final conclusion. $R$ can be written with units

$J \cdot K^{-1} \cdot mol^{-1}$.

The beautiful thing though is this. The physical interpretation of the ideal gas law is saying that, for an ideal gas, of any kind, "the energy per degree per mole" is a constant (that constant being $\approx 8.3144 \frac{^J/_K}{mol}$)!

Friday, October 3, 2014

Recursive list

Recursive list

Way, way back we looked at a function to estimate the value of the sin function utilizing its representation as a Taylor polynomial about x = 0. When one looks closely at that Taylor polynomial, one can observe a pattern in the coefficients involving the infinite sequence 0, 1, 0, -1, 0, 1, 0, -1, .... My buddy Marcelo in his formulation of that Taylor polynomial wrote this function that for a given n produces the first n values of this sequence as a list:

let rec sin_coeff n =
  match n with
  | 0 -> []
  | 1 -> [0.]
  | 2 -> [0.; 1.]
  | 3 -> [0.; 1.; 0. ]
  | _  -> 0. :: 1. :: 0. :: -1. :: sin_coeff (n-4)
He mused to me on the day that "surely there is a more elegant way to produce this list?". Oh, by golly Marcelo there certainly is!
let rec sin_coeff = 0 :: 1 :: 0 :: -1 :: sin_coeff
This little construction builds the list value sin_coeff recursively (that is, in terms of itself). Now, of course it's not impossible to define structures like this in C or C++ but good luck matching the brevity and elegance afforded to us in OCaml for this sort of thing! We still need a little function of course that can pull off this endless list a list containing just the first n elements. The ubiquitous take function will do for this purpose and I show it here just for completeness.
let rec list_take k l =
  if k <= 0 then []
  else
    match l with
    | [] -> []
    | (x :: xs)  -> x :: (list_take (k-1) xs)