Sunday, September 21, 2014

Correlation Coefficient

Correlation coefficient

It's been a while since we looked at anything from the domain of statistics so here's another little bite-sized piece - a function to compute Pearson's "product moment correlation coefficient".

It's a measure of dependence between two data sets. We'll express it in terms of unbiased standard deviation which I didn't write out before so I'll include that function too.

let unbiased_standard_deviation t =
  (*http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation

    In statistics and in particular statistical theory, unbiased
    estimation of a standard deviation is the calculation from a
    statistical sample of an estimated value of the standard deviation
    (a measure of statistical dispersion) of a population of values,
    in such a way that the expected value of the calculation equals
    the true value.

  *)
 let av = arithmetic_mean t in
 let squared_diffs =
   List.fold_left (fun acc xi -> ((xi -. av) *. (xi -. av)) :: acc) [] t
 in sqrt ((sum squared_diffs)/.((float_of_int (List.length t)) -. 1.0))

let correlation_coefficient x y =
  (*http://en.wikipedia.org/wiki/Correlation_and_dependence

    The most familiar measure of dependence between two quantities is
    the Pearson product-moment correlation coefficient, or "Pearson's
    correlation coefficient", commonly called simply "the correlation
    coefficient". It is obtained by dividing the covariance of the two
    variables by the product of their standard deviations.  
  *) 

  let x_b = arithmetic_mean x in
  let y_b = arithmetic_mean y in
  let s_x = unbiased_standard_deviation x in
  let s_y = unbiased_standard_deviation y in

  if s_x = 0. || s_y = 0. then 0.
  else
    let f acc x_i y_i =
      acc +. ((x_i -. x_b) *. (y_i -. y_b)) in
    let n = float_of_int (List.length x) in
    let s = List.fold_left2 f 0.0 x y  in
    s/.((n -. 1.) *. s_x *. s_y)

Saturday, September 6, 2014

Concatenation of a list of strings

Concatenation of a list of strings

Here's another fun (but probably silly) exercise. Its value I posit, is in highlighting the fundamental similarities that exist between the C++ and OCaml languages (that emerge when one "peeks" beyond the apparent dissimilarities on the surface). Maybe this sort of comparison aids in "lowering the barrier to entry" for the C++ programmer embarking on a journey into OCaml? Anyway, here we go.

The OCaml String module contains a function concat which concatenates a list of strings whilst inserting a separator between each of the elements. Prior to OCaml 4.02 at least, it's implementation went as follows:

let concat sep l =
  match l with
    [] -> ""
  | hd :: tl ->
      let num = ref 0 and len = ref 0 in
      List.iter (fun s -> incr num; len := !len + length s) l;
      let r = create (!len + length sep * (!num - 1)) in
      unsafe_blit hd 0 r 0 (length hd);
      let pos = ref(length hd) in
      List.iter
        (fun s ->
          unsafe_blit sep 0 r !pos (length sep);
          pos := !pos + length sep;
          unsafe_blit s 0 r !pos (length s);
          pos := !pos + length s)
        tl;
      r

So, a faithful translation of this program into C++ (unsafe_blit 'n all), yields this:

#include <boost/range.hpp>

#include <string>
#include <numeric>
#include <cstring>

namespace string_util /*In honor of Stefano of http://spacifico.org/ :)*/ 
{
  template <class RgT>
  std::string concat (std::string const& sep, RgT lst)
  {
    if (boost::empty (lst)) return "";

    std::size_t num = 0, len = 0;
    std::accumulate (
      boost::begin (lst), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int { ++num, len += s.size(); return _; } );
    std::string r(len + sep.size () * (num - 1), '\0');
    std::string const& hd = *(boost::begin (lst));
    std::memcpy ((void*)(r.data ()), (void*)(hd.data ()), hd.size());
    std::size_t pos = hd.size();
    std::accumulate (
      boost::next (boost::begin (lst)), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int {
        std::memcpy((void*)(r.data()+pos),(void*)(sep.data()),sep.size());
        pos += sep.size ();
        std::memcpy ((void*)(r.data()+pos),(void*)(s.data()),s.size());
        pos += s.size ();
        return _; });
  
    return r;
  }
}//namespace<string_util>
For example, this fragment
  #include <boost/assign/list_of.hpp>

  // ...

  std::list  lst = boost::assign::list_of ("foo")("bar")("baz");
  std::string r = string_util::concat (",", lst);
will produce the string "foo,bar,baz".

So there it is... As usual, a little more verbosity required on the C++ side but otherwise, not much between them IMHO. Agree?