Wednesday, October 26, 2016

Haskell type-classes in OCaml and C++

Haskell type-classes in OCaml and C++

This article examines the emulation of Haskell like type-classes in OCaml and C++. It follows [1] closely (recommended for further reading), extending on some of the example code given there to include C++.

First stop, a simplified version of the Show type-class with a couple of simple instances.

    class Show a where
      show :: a -> string

    instance Show Int where
      show x = Prelude.show x -- internal

    instance Show Bool where
      str True = "True"
      str False = "False"
    
The OCaml equivalent shown here uses the "dictionary passing" technique for implementation. The type-class declaration Show in Haskell translates to a data-type declaration for a polymorphic record α show in OCaml.
    type α show = {
      show : α → string
    }

    let show_bool : bool show = {
      show = function | true → "True" | false → "False"
    }

    let show_int : int show = {
      show = string_of_int
    }
    
In C++ we can use a template class to represent the type-class and specializations to represent the instances.
      template <class A> struct Show {};

      template <>
      struct Show<int> {
        static std::string (*show)(int);
      };
      std::string(*Show<int>::show)(int) = &std::to_string;

      template <>
      struct Show<bool> {
        static std::string show (bool);
      };
      std::string Show<bool>::show (bool b) { return b ? "true" : "false"; }
    

Next up print, a parametrically polymorphic function.

      print :: Show a => a -> IO ()
      print x = putStrLn$ show x
    
According to our dictionary passing scheme in OCaml, this renders as the following.
      let print : α show → α → unit = 
        fun {show} → fun x → print_endline@@ show x
    
The key point to note here is that in OCaml, evidence of the α value's membership in the Show class must be produced explicitly by the programmer. In C++, like Haskell, no evidence of the argument's membership is required, the compiler keeps track of that implicitly.
    template <class A>
    void print (A const& a) {
      std::cout << Show<A>::show (a) << std::endl;
    }
    

This next simplified type-class shows a different pattern of overloading : the function fromInt is overloaded on the result type and the (+) function is binary.

    class Num a where
      fromInt :: Int -> a
      (+)     :: a -> a -> a

    sum :: Num a => [a] -> a
    sum ls = foldr (+) (fromInt 0) ls
    
Translation into OCaml is as in the following.
    type α num = {
      from_int : int → α;
      add      : α → α → α;
    }

    let sum : α num → α list → α = 
      fun {from_int; add= ( + )} → 
        fun ls →
          List.fold_right ( + ) ls (from_int 0)
    
Translation into C++, reasonably mechanical. One slight disappointment is that it doesn't seem possible to get the operator '+' syntax as observed in both the Haskell and OCaml versions.
    template <class A>
    struct Num {};
    
    namespace detail {
      template <class F, class A, class ItT>
      A fold_right (F f, A z, ItT begin, ItT end) {
        if (begin == end) return z;
        return f (fold_right (f, z, std::next (begin), end), *begin);
      }
    }//namespace<detail> 
    
    template <class ItT>
    typename std::iterator_traits<ItT>::value_type 
    sum (ItT begin, ItT end) {
      using A = typename std::iterator_traits<ItT>::value_type;
      auto add = Num<A>::add;
      auto from_int = Num<A>::from_int;
      return detail::fold_right (add, from_int (0), begin, end);
    }
    
In Haskell, Int is made a member of Num with this declaration.
     instance Num Int where
       fromInt x = x
       (+)       = (Prelude.+)
   
Returning to OCaml, we can define a couple of instances including the one above like this.
    let int_num  : int num  = { 
      from_int = (fun x → x); 
      add      = Pervasives.( + ); 
    }
    
    let bool_num : bool num = {
      from_int = (function | 0 → false | _ → true);
      add = function | true → fun _ → true | false → fun x → x
    }
   
The code defining those above instances in C++ follows.
    template <>
    struct Num<int> {
      static int from_int (int);
      static int add (int, int);
    };
    
    int Num<int>::from_int (int i) { return i; }
    int Num<int>::add (int x, int y) { return x + y; }
    
    template <>
    struct Num<bool> {
      static bool from_int (int);
      static bool add (bool, bool);
    };
    
    bool Num<bool>::from_int (int i) { return i != 0; }
    bool Num<bool>::add (bool x, bool y) { if (x) return true; return y; }
   
Here now is a function with two type-class constraints.
    print_incr :: (Show a, Num a) => a -> IO ()
    print_incr x = print$ x + fromInt 1
   
In OCaml this can be written like so.
    let print_incr : (α show * α num) → α → unit = 
      fun (show, {from_int; add= ( + )}) → 
        fun x → print show (x + (from_int 1))
   
In C++, this is said as you see below.
    template <class A>
    void print_incr (A x) {
      print (Num<A>::add (x, Num<A>::from_int (1)));
    }
   
Naturally, the above function will only be defined for types A that are members of both the Show and Num classes and will yield compile errors for types that are not.

Moving on, we now look at another common pattern, an instance with a constraint : a Show instance for all list types [a] when the element instance is a member of Show.

    instance Show a => Show [a] where
      show xs = "[" ++ go True xs
        where
          go _ [] = "]"
          go first (h:t) =
            (if first then "" else ", ") ++ show h ++ go False t
   

In OCaml, this takes the form of a function. The idea is, given evidence of a type α's membership in Show the function produces evidence that the type α list is also in Show.
    let show_list : α show → α list show =
      fun {show} →
        {show = fun xs →
          let rec go first = function
            | [] → "]"
            | h :: t →
              (if (first) then "" else ", ") ^ show h ^ go false t in
          "[" ^ go true xs
        }
   
It might be possible to do better than the following partial specialization over vector<> in C++ (that is, to write something generic, just once, that works for a wider set ofsequence types) using some advanced meta-programming "hackery", I don't really know. I suspect finding out might be a bit of a rabbit hole best avoided for now.
    template <class A>
    struct Show<std::vector<A>> {
      static std::string show (std::vector<A> const& ls);
    };
    
    template <class A>
    std::string Show<std::vector<A>>::show (std::vector<A> const& ls) {
      bool first=true;
      typename std::vector<A>::const_iterator begin=ls.begin (), end=ls.end ();
      std::string s="[";
      while (begin != end) {
        if (first) first = false;
        else s += ", ";
        //A compile time error will result here if if there is no
        //evidence that `A` is in `Show`
        s += Show<A>::show (*begin++);
      }
      s += "]";
    
      return s;
    }
   

In this next example, we need a type-class describing types that can be compared for equality, Eq. That property and the Num class can be combined to produce a type-class with a super-class and a default.

    class Eq where
      (==) :: a -> a -> bool
      (/=) :: a -> a -> bool
  
    deriving instance Eq Bool
    deriving instance Eq Int

    class (Eq a, Num a) => Mul a where
      (*) :: a -> a -> a
      x * _ | x == fromInt 0 = fromInt 0
      x * y | x == fromInt 1 = y
      x * y | y + (x + (fromInt (-1))) * y

    dot :: Mul a => [a] -> [a] -> a
    dot xs ys = sum$ zipWith ( * ) xs ys
   
Modeling the above in OCaml is done with a dictionary for the Mul type-class and a function to generate instances from super-class instances.
    type α mul = {
      mul_super : α eq * α num;
      mul : α → α → α
    }
    
    let mul_default : α eq * α num → α mul = 
      fun (({eq}, {from_int; add = ( + )}) as super) → 
        {
          mul_super = super;
          mul = let rec loop x y = begin match () with
          | () when eq x (from_int 0) → from_int 0
          | () when eq x (from_int 1) → y
          | () → y + loop (x + (from_int (-1))) y 
          end in loop
        }
    
    let bool_mul : bool mul = 
      mul_default (bool_eq, bool_num)
    
    let int_mul : int mul = {
      mul_super = (int_eq, int_num);
      mul = Pervasives.( * )
    }

    let dot : α mul → α list → α list → α = 
      fun {mul_super = (eq, num); mul} →
        fun xs ys → sum num@@ List.map2 mul xs ys
   
As one would expect, expressing the base class/derived class relationships in C++ is playing to its strengths.
    template <class A> struct Eq {};
    
    template <>
    struct Eq<bool> {
      static bool eq (bool, bool);
      static bool neq (bool, bool);
    };
    
    bool Eq<bool>::eq (bool s, bool t) { return s == t; }
    bool Eq<bool>::neq (bool s, bool t) { return s != t; }
    
    template <>
    struct Eq<int> {
      static int eq (int, int);
      static int neq (int, int);
    };
    
    int Eq<int>::eq (int s, int t) { return s == t; }
    int Eq<int>::neq (int s, int t) { return s != t; }

    template <class A>
    struct Mul : Eq<A>, Num <A> {
      using Eq<A>::eq;
      using Num<A>::add;
      using Num<A>::from_int;
    
      static A mul (A x, A y);
    };
    
    template <class A>
    A Mul<A>::mul (A x, A y) {
      if (eq (x, from_int (0))) return from_int (0);
      if (eq (x, from_int (1))) return y;
      return add (y, mul ((add (x, from_int (-1))), y));
    }
    
    template struct Mul<bool>;
    template <> int Mul<int>::mul (int x, int y) { return x * y; }
    
    namespace detail{
    
      template <class F, class It, class Acc>
      Acc map2 (F f
       , It xs_begin, It xs_end, It ys_begin, It ys_end, Acc acc) {
        if ((xs_begin == xs_end) || (ys_begin == ys_end)) return acc;
        return map2 (f
              , std::next (xs_begin)
              , xs_end
              , std::next (ys_begin)
              , ys_end
              , *acc++ = f (*xs_begin, *ys_begin));
      }
    
    }//namespace detail
    
    template <class A>
    A dot (std::vector<A> const& xs, std::vector<A> const& ys) {
      std::vector<A> buf;
      detail::map2 (
         Mul<A>::mul
       , xs.begin (), xs.end()
       , ys.begin (), ys.end ()
       , std::back_inserter(buf));
      return sum (buf.begin (), buf.end ());
    }
   

This very last example is in polymorphic recursion. The Haskell reads as follows.

   print_nested :: Show a => Int -> a -> IO ()
   print_nested 0 x = print x
   print_nested n x = print_nested (n - 1) (replicate n x)

   test_nested = do
     n <- getLine
     print_nested (read n) (5::Int)
   

Those two functions are very interesting! Translating it to OCaml yields the following.
    let rec replicate : int → α → α list = 
      fun n x → if n >= 0 then [] else x :: replicate (n - 1) x
    
    let rec print_nested : α. α show → int → α → unit =
      fun show_dict → function
      | 0 →
          fun x →
            print show_dict x
      | n → 
          fun x →
            print_nested (show_list show_dict) (n - 1) (replicate n x)

    let test_nested =
      let n = read_int () in
      print_nested show_int n 5
   
Now if you examine the output of the above if '4' (say) was entered, you'll see something like this:
   [[[[5, 5, 5, 5], [5, 5, 5, 5], [5, 5, 5, 5]], [[5, 5, 5, 5], [5, 5,
   5, 5], [5, 5, 5, 5]]]]
   
You can see, looking at this, that the type of the printed list is not determinable at compile-time. It is dependent on a runtime parameter! It follows that the evidence that the type is in the Show class can not be produced statically. It has to be computed dynamically which is what you see there in the application of show_list to the current show_dict in the n <> 0 branch of the print_nested function. Note also the requirement for the universal quantifier in the function signature. It's mandatory.

OK, so how about the above code in C++? Well a naive transliteration gives the following.

    namespace detail {
      template<class A, class ItT>
      ItT replicate (int n, A x, ItT dst) {
        if (n <= 0) return dst;
        return replicate ((n - 1), x, *dst++ = x);
      }
    
    }//namespace detail
    
    template <class A>
    void print_nested (int n, A const& x) {
      if (n == 0)
        print (x);
      else {
        std::vector<A> buf;
        detail::replicate(n, x, std::back_inserter(buf));
        print_nested (n - 1, buf);
      }
    }
    
    void test_nested () {
      int n;
      std::cin >> n;
      print_nested (n, 5);
    }
   
Unfortunately though, this program though exhibits unbounded compile time recursion (compilation doesn't terminate).


References:
[1] Implementing, and Understanding Type Classes -- Oleg Kiselyov