Shayne Fletcher

Cabal package macros (MIN_VERSION_xyz)

2023-01-14T13:27:00.000-05:00

Cabal package macros (`MIN_VERSION_xyz`)

cabal build ... generates cabal_macros.h containing e.g. for v 3.5.0 a definition like,

/* package hlint-3.5 */
#ifndef VERSION_hlint
#define VERSION_hlint "3.5"
#endif /* VERSION_hlint */
#ifndef MIN_VERSION_hlint
#define MIN_VERSION_hlint(x,y,z) (\
  (x) <  3 || \
  (x) == 3 && (y) <  5 || \
  (x) == 3 && (y) == 5 && (z) <= 0)
#endif /* MIN_VERSION_hlint */

This macro is a compile time predicate. Use to test the hlint configured package version is at least x.y.z.

We might for example test if the configured package version is at least 3.6.0 by

MIN_VERSION_hlint(3.6.0)

By substitution into the macro body we arrive at

3 < 3 || 3 == 3 && 6 < 5 || 3 == 3 and 6 == 5 and 0 <= 0

to conclude, no it is not.

Testing a new stack resolver

2022-08-07T17:16:00.003-04:00

Testing a new stack resolver

When there’s been a new GHC release, it can take a little while for there to be a stack resolver for it. The following procedure can be used for some local testing if you want to get ahead of the game¹.

Where stack looks for resolvers

If you execute the command,

    $stack setup --resolver ghc-x.y.z

you’ll see something like

    No setup information found for ghc-x.y.z on your platform.

if there isn’t yet a resolver for ghc-x.y.z.

The default set of resolvers stack “knows about” are those enumerated in the file stack-setup-2.yaml in the commericalhaskell/stackage-content repository.

Create a local stack setup file

Start by downloading the release binary package tarball of interest from www.haskell.org and note its

size (in bytes):

       stat -f%z ghc-x.y.z-x86_64-apple-darwin.tar.xz

SHA1 hash:

       shasum -a 1 ghc-x.y.z-x86_64-apple-darwin.tar.xz

SHA256 hash:

       shasum -a 256 ghc-x.y.z-x86_64-apple-darwin.tar.xz

Now, create stack-setup-2-ghc-x.y.z.yaml with contents along the lines of

        ghc:
          macosx:
            x.y.z:
              url: http://downloads.haskell.org/~ghc/x.y.z/ghc-x.y.z-x86_64-apple-darwin.tar.xz
              content-length: 177152992
              sha1: 2dbd726860ed2c0ea04c7aca29c22df20b952ee1
              sha256: f2e8366fd3754dd9388510792aba2d2abecb1c2f7f1e5555f6065c3c5e2ffec4

Update: Here’s an even easier way.

    ghc:
     macosx:
        x.y.z:
          url: /path/to/ghc-x.y.z-x86_64-apple-darwin.tar.xz

Install GHC version `x.y.z` via the setup file

The following stack setup command will use the setup file created above to download and install ghc-x.y.z:

  stack --setup-info-yaml stack-setup-2-ghc-x.y.z.yaml --resolver ghc-x.y.z setup

Once stack has got GHC installed, there’s no further any need to pass a setup-info-yaml argument to subsequent stack commands, it’s ready to go!

Kindly explained to me by Mike Pilgrem in this ticket. This note is biased towards my needs on macOS. See the linked issue for further details especially for Windows installations.↩︎

Annotations in GHC

2021-05-19T00:41:00.011-04:00

annotations

Annotations in GHC

Starting with ghc-9.2.1, parse trees contain “annotations” (these are, for example, comments and the locations of keywords). This represents a non-trivial upgrade of GHC parse trees. If you work with GHC ASTs in your project, there will be no avoiding getting to know about them. This note is a summary overview of annotations: the where and how of their representations.

In-tree annotations enable exact-printing of GHC ASTs. This feature and the reformulation of the GHC AST with in-tree annotations to support it was conceived of and implemented by Alan Zimmerman (@alan_zimm). The achievement is of truly epic scale.

Annotations on syntactic elements

An EpaLocation is a span giving the exact location of a keyword in parsed source.

data EpaLocation = EpaSpan RealSrcSpan | EpaDelta DeltaPos
data DeltaPos = ...

The parser only inserts EpaSpans.

A DotFieldOcc arises in expressions like (.e) (field-selector) or a.e (field-selection) when OverloadedRecordDot is enabled. A DotFieldOcc value in the parse phase is associated with an AnnFieldLabel in its extension field (annotations in ghc-9.2.1 lean heavily on the facilities afforded by TTG). The AnnFieldLabel contains the location of the ‘.’. AnnFieldLabel is an “annotation type”. You’ll recognize annotation types (there are many) by the convention that their names are prefixed Ann.

-- GHC.Hs.Expr
data AnnFieldLabel
  = AnnFieldLabel {
      afDot :: Maybe EpaLocation
      }
type instance XCDotFieldOcc (GhcPass _) = EpAnn AnnFieldLabel

-- Language.Haskell.Syntax.Expr
data DotFieldOcc p
  = DotFieldOcc
    { dfoExt   :: XCDotFieldOcc p
    , dfoLabel :: XRec p FieldLabelString
    }
  | XDotFieldOcc !(XXDotFieldOcc p)

(What XRec p FieldLabelString means will be explained in the next section.)

Note that the extension field dfoExt doesn’t contain a “raw” AnnFieldLabel, rather, it contains an EpAnn AnnFieldLabel.

EPAnn, envelopes an annotation. It associates a base location for the start of the syntactic element containing the annotation along with any comments enclosed in the source span of the element to which the EPAnn is attached. EpAnnUnsed is used when an annotation is required but there’s no annotation available to envelope (e.g one obvious case being in generated code).

data EpAnn ann
  = EpAnn { entry :: Anchor
          , anns :: ann
          , comments :: EpAnnComments }
  | EpAnnNotUsed

data EpAnnComments = ...

It’s the Anchor type where the base location is held.

data Anchor = Anchor { anchor :: RealSrcSpan, anchor_op :: AnchorOperation }

data AnchorOperator = ...

Annotations on source spans

Annotations don’t just get attached to syntactic elements, they frequently get attached to source spans too.

data SrcSpanAnn' a = SrcSpanAnn { ann :: a, locA :: SrcSpan }

Usually SrcSpanAnn' is used with EpAnn and that combination is named a SrcAnn.

data SrcAnn ann = SrcSpanAnn' (EpAnn ann)

There are many annotation types. The most ubiquitous are AnnListItem, NameAnn, AnnList, AnnPragma and AnnContext. Their use is common enough that names are given to their SrcAnn types (which you recall, wrap them in EpAnn and associate them with a SrcSpan).

type SrcSpanAnnA = SrcAnn AnnListItem
type SrcSpanAnnN = SrcAnn NameAnn

type SrcSpanAnnL = SrcAnn AnnList
type SrcSpanAnnP = SrcAnn AnnPragma
type SrcSpanAnnC = SrcAnn AnnContext

Of these, SrcSpanAnnA is used as a sort of “default” annotation.

What do you do with generalized SrcSpan types like these? You locate things with them.

type LocatedA = GenLocated SrcSpanAnnA
type LocatedN = GenLocated SrcSpanAnnN

type LocatedL = GenLocated SrcSpanAnnL
type LocatedP = GenLocated SrcSpanAnnP
type LocatedC = GenLocated SrcSpanAnnC

These type synonyms are only for the most commonly used annoation types. The general case is LocatedAn an.

type LocatedAn an = GenLocated (SrcAnn an)

To recap, a LocatedAn an is a GenLocated (SrcAnn an) which is a GenLocated (SrcSpanAnn' (EpAnn an)).

Abstracting over locations

Syntax definitions are generalized with respect to location information. That is, rather than hard-coding SrcSpan into syntax type definitions as we used to, type families are used in their place so that the structure of the syntax including locations can be described without fixing concrete types for the locations where you’d once have had a source span type.

It works like this. In Language.Haskell.Syntax.Extension there is this definition:

type family XRec p a = r | r -> a

Locations are specified in terms of XRecs. For example in Language.Haskell.Syntax.Expr we have this:

type LHsExpr p = XRec p (HsExpr p)

How XRec p (HsExpr p) is mapped onto a specific type in GHC is achieved in the following way. First in Language.Haskell.Syntax.Extension there is the following definition:

type family Anno a = b

Then, in GHC.Hs.Extension this definition:

type instance XRec (GhcPass p) a = GenLocated (Anno a) a

Specific choices for each syntatic element can then be made for GHC’s use of the parse tree and phase. For example, in GHC.Hs.Expr we have the following.

type instance Anno (HsExpr (GhcPass pass)) = SrcSpanAnnA

To see how this works, consider what that means for the located expression type LHsExpr GhcPs in GHC. We have LHsExpr GhcPs is XRec GhcPs (HsExpr GhcPs) which is GenLocated (Anno (HsExpr GhcPs)) (HsExpr GhcPs) or GenLocated SrcSpanAnnA (HsExpr GhcPs) (or, LocatedA (HsExpr GhcPs)) if you like).

Expanding further we have GenLocated SrcSpanAnnA (HsExpr GhcPs) is GenLocated (SrcAnn AnnListItem) (HsExpr GhcPs). So ultimately, LHsExpr GhcPs is GenLocated (SrcSpanAnn' (EpAnn AnnListItem)) (HsExpr GhcPs).

arith-cxx-tagless-final

2021-04-09T17:33:00.003-04:00

arith-cxx-final-tagless

arith-cxx-tagless-final

The aim is to prove out the idea of an interpreter where the “front end” (lexical analysis) is in Rust and the “back end” (interpretation) is in C++.

We’ll parse and evaluate a simple language of arithmetic expressions;
- The parser is implemented using nom (a Rust parser combinator library);
The “tagless-final” idiom is used to split the front and back ends;
The interop between C++ and Rust is expressed using the CXX library.

Parser

Additive expression syntax we’ll define by this grammar:

    expr := term ('+' term)*
    term := lit | '-' term | '(' expr ')'
    lit  := digits

The key idea of the program is this: Don’t define an abstract syntax tree type, values of which are produced by parsing. Rather, as parsing unfolds, call functions defined by the the folllowing trait.

pub trait ExprSyn: Clone {
    fn lit(n: i64) -> Self;
    fn neg(t: Self) -> Self;
    fn add(u: Self, v: Self) -> Self;
}

With that understood, we implement the parser as a Rust library in the following way.

pub mod parse {
    use nom::{
        branch::alt,
        bytes::complete::tag,
        character::complete::char,
        character::complete::{digit1 as digit, space0 as space},
        combinator::{map, map_res},
        multi::fold_many0,
        sequence::{delimited, pair, preceded},
        IResult,
    };

    use super::ExprSyn;
    use std::str::FromStr;

    type ParseResult<'a, E> = IResult<&'a str, E>;

    fn lit<E: ExprSyn>(i: &str) -> ParseResult<E> {
        map_res(delimited(space, digit, space), |x| {
            FromStr::from_str(x).map(E::lit)
        })(i)
    }

    fn neg<E: ExprSyn>(i: &str) -> ParseResult<E> {
        map(delimited(space, preceded(char('-'), term), space), E::neg)(i)
    }

    fn par<E: ExprSyn>(i: &str) -> ParseResult<E> {
        delimited(space, delimited(tag("("), expr, tag(")")), space)(i)
    }

    fn term<E: ExprSyn>(i: &str) -> ParseResult<E> {
        alt((lit, neg, par))(i)
    }

    pub fn expr<E: ExprSyn>(i: &str) -> ParseResult<E> {
        let (i, init) = term(i)?;
        fold_many0(pair(char('+'), term), init, |acc, (_, val): (char, E)| {
            E::add(acc, val)
        })(i)
    }
}

To wire that up to C++ we need express a CXX “bridge”.

use cxx::SharedPtr;

#[cxx::bridge]
pub mod ffi {
    unsafe extern "C++" {
        include!("arith_final_tagless/include/cpp_repr.hpp");
        type Cpp_repr;

        fn lit(i: i64) -> SharedPtr<Cpp_repr>;
        fn neg(e: SharedPtr<Cpp_repr>) -> SharedPtr<Cpp_repr>;
        fn add(l: SharedPtr<Cpp_repr>, r: SharedPtr<Cpp_repr>) -> SharedPtr<Cpp_repr>;
    }

    extern "Rust" {
        fn parse(s: String) -> Result<SharedPtr<Cpp_repr>>;
    }
}

The header file cpp_repr.hpp referenced by the bridge contains these prototypes.

#pragma once

#include <memory>
#include <cstdint>

struct Cpp_repr {
  int64_t expr;
};

using cpp_repr_t = std::shared_ptr<Cpp_repr>;

cpp_repr_t lit(int64_t i);
cpp_repr_t neg(cpp_repr_t t);
cpp_repr_t add(cpp_repr_t t, cpp_repr_t u);

The existence of that header is enough to finish off the Rust library.

#[allow(non_camel_case_types)]
pub type CppRepr_t = SharedPtr<ffi::Cpp_repr>;

impl ExprSyn for CppRepr_t {
    fn lit(i: i64) -> CppRepr_t {
        ffi::lit(i)
    }
    fn neg(t: CppRepr_t) -> CppRepr_t {
        ffi::neg(t)
    }
    fn add(t1: CppRepr_t, t2: CppRepr_t) -> CppRepr_t {
        ffi::add(t1, t2)
    }
}

pub fn parse(s: String) -> Result<CppRepr_t, String> {
    match parse::expr::<CppRepr_t>(s.as_str()) {
        Ok((_s, rep)) => Ok(rep),
        Err(e) => Err(format!("{}", e)),
    }
}

Evaluator

We put the C++ part of the implementation in cpp_repr.cpp in a separate library.

#include <iostream>

#include "arith_final_tagless/include/cpp_repr.hpp"

cpp_repr_t lit(int64_t i) {
  return std::shared_ptr<Cpp_repr>{new Cpp_repr{i}};
}

cpp_repr_t neg(cpp_repr_t t) {
  return std::shared_ptr<Cpp_repr>{new Cpp_repr{-t->expr}};
}

cpp_repr_t add(cpp_repr_t t, cpp_repr_t u) {
  return std::shared_ptr<Cpp_repr>{new Cpp_repr{t->expr + u->expr}};
}

Interpreter

The REPL in main.cpp brings the Rust and C++ libraries together into an executable.

#include <iostream>

#include "rust/cxx.h" // 'rust::Error'
#include "arith_final_tagless/src/arith_final_tagless.rs.h" // 'parse'

int main() {
  char const* prompt = "\n% ";
  std::cout << "Additive expression evalutator (type CTRL+D to exit)" << prompt;

  std::string line;
  while(std::getline(std::cin, line)) {
    try {
      if (auto p = parse(rust::String{line})) {
        std::cout << line << " = " << p->expr << prompt;
      }
    } catch (rust::Error const& e) {
      std::cerr << e.what() << prompt;
    }
  }

  return 0;
}

Source and build scripts and so on for this project here.

Configuring Cabal Build Flags

2021-02-07T13:17:00.003-05:00

*markdown-output*

Configuring Cabal build flags

It’s always an emergency when I go looking for this information!

Suppose you are building HLint.

mkdir ~/tmp && cd ~/tmp
curl -o hlint-3.2.7.tar.gz https://hackage.haskell.org/package/hlint-3.2.7/hlint-3.2.7.tar.gz
gunzip  hlint-3.2.7.tar.gz && tar xvf  hlint-3.2.7.tar

Left to their own devices, HLint and ghc-lib-parser-ex will default to auto mode meaning, they will decide for themselves if they should depend on ghc-lib-parser or native ghc libraries.

Sometimes it’s desirable to force the situation though and explicitly make them do one or the other. How you do that? The answer is of course package Cabal flags. There are two scenarios: building with stack or building with cabal.

stack.yaml

Force link with ghc-lib-parser

  flags:
    hlint:
      ghc-lib: true
    ghc-lib-parser-ex:
      auto: false
      no-ghc-lib: false

Force link with native ghc

  flags:
    hlint:
      ghc-lib: false
    ghc-lib-parser-ex:
      auto: false
      no-ghc-lib: true

cabal.project

Force link with ghc-lib-parser

  packages: hlint-3.2.7
  package hlint
    flags: +ghc-lib
  package ghc-lib-parser-ex
    flags: -auto -no-ghc-lib

Force link with native ghc

  packages: hlint-3.2.7
  package hlint
    flags: -ghc-lib
  package ghc-lib-parser-ex
    flags: -auto +no-ghc-lib

When working with Cabal in the HLint repository one can configure with command line constraints like this:

cabal new-build --constraint "hlint -ghc-lib" --constraint "ghc-lib-parser-ex -auto +no-ghc-lib"

Two things in Rust

2021-01-18T20:04:00.002-05:00

two_things_rust.html

Two things in Rust

Two things I needed to learn before Rust made sense to me.

1 Pattern binding modes

I don’t remember reading about this in the book. Default binding modes come into play when non-reference patterns are encountered.

Example


let x = Some(3);
let y: &Option<i32> = &x;
match y {
    Some(a) -> {
    // `y` is deref'd and `a` is bound as `ref a`
    }
    None => {}
}

The default binding mode starts as move. Each time a reference is matched using a non-reference pattern; it will automatically derefence the vaue and update the default binding mode - If the reference is &val, set the default binding mode to ref - If the reference is &mut val: if the current default binding is ref, it should remain ref. Otherwise, set the current binding mode to ref mut.

Full details are given in the 2005-match-ergonomics rustlang RFC.

Example

Example

match (&Some(3)) {
    Some(p) =>
      // This pattern is a "non-reference pattern".
      // Dereference the `&` and shift the default binding
      // mode to `ref`. `p` is read as `ref p` and given type `i32`.
   x => {
     // In this arm, we are still in `move` mode by default, so `x`
     // has type `&Option<32>`
   }
}

Desugared

    match(&Some(3)) {
      &Some(ref p) => {
         ...
      },
      x => {
         ...
      },
    }

2. Implict Deref Coercisons with Functions and Methods

This is another ergonomics feature that saves on explicit &s and *s when writing function and method calls. When we pass a reference to a function or method call deref implicitly as needed to coerce to the parameter target type.

Example:

fn hello(name: &str) {
    println!("Hello, {}", name);
}

fn main() {
    let m = MyBox::new(String::from("Rust"));
    hello(&m);
}

Example:

pub struct Point {
    x: Vec<i32>,
    y: Vec<i32>,
}

impl Point {
    pub fn x(&self) -> &Vec<i32> {
        match self {
            &Point { ref x, .. } => x,
        }
    }
}

fn use_i32(_: &i32) -> () {}

fn use_vi32(_: &Vec<i32>) -> () {}

fn use_str(_: &str) -> () {}

fn use_strr(_: &&String) -> () {}

fn main() {
    let p: Point = Point {
        x: vec![],
        y: vec![],
    };

    let rp: &Point = &p;
    let rrp: &&Point = &rp;

    println!("p.x = {:?}", rrp.x());

    let _s: &str = "foo";
    let s: String = String::from(_s);
    let bs: Box<String> = Box::new(s.clone());
    let bsr: Box<&String> = Box::new(&s);
    let bi32: Box<Vec<i32>> = Box::new(vec![]);

    use_i32(&&&&1i32);

    use_str(bs.deref());
    use_str(&s);
    use_strr(bsr.deref());
    let r: &&String = &bsr;
    let r2: &String = r.deref();

    use_str(r2);
    use_str(&bsr);
    use_vi32(&bi32);

    let p = Point {
        x: vec![1, 2, 3],
        y: vec![4, 5, 6],
    };
    match &p {
        Point { x, y } => {
            use_vi32(x);
            use_vi32(y);
            println!("{:?}, {:?}", x, y);
        }
    }
    println!("{:?}, {:?}", p.x, p.y);
}

ghc-lib-parser module count

2020-10-30T14:51:00.011-04:00

ghc-lib-parser module count

When a user installs a program like HLint via Cabal, there's a good chance they'll pay a cost of building ghc-lib-parser. The build time for ghc-lib-parser is proportional to the number of modules that it contains, the less there are the faster it builds. The fewer there are, the better the user experience installing HLint.

Back in September last year there was a bit of a scare. At that time, ghc-lib-parser consisted of around 165 modules (and ghc-lib had roughly 300). An MR landed that unintentionally resulted in ghc-lib-parser needing 543 modules (and ghc-lib getting just 25). Fortunately a quick refactoring sorted that out (thanks @sgraf1337!) As @mpickering_ correctly pointed out at that time "any fix should ensure adding a test to make sure it doesn't happen again". To that end, @cocreature and I managed to work out the details of one which we can have a look at in a minute.

Since a year has now elapsed since the test was introduced, it's interesting to see how the module count has changed over time. TL;DR it's not too bad — the number of modules has increased from around 165 a year ago to about 230 today:

So how does the test work? To be comfortable in the knowledge that the number of modules needed in ghc-lib-parser will not signficantly change without anyone noticing, it's enough to have a program that counts them. That program when inserted into GHC's continuous integration system alerts the committer to a limit breach. How do we count them? We use the GHC API to compute the transitive closure of the dependencies of GHC.Parser — the count is the cardinality of that. The code is only a few lines. The key to it is the function hscGetModuleInterface. You can read the source here.

Syntactic ambiguity resolution in the GHC parser

2020-04-17T17:48:00.005-04:00

Syntactic ambiguity resolution in the GHC parser

There are places in the Haskell grammar where it's not known apriori whether it's an expression a command or a pattern that is being parsed. This used to be handled by picking a parse (e.g. as an expression say) and if that choice later turned out to be wrong, "rejigging it" (transform the constructed parse tree to its analog in the pattern language). The problem with that approach is that it meant having conflated sub-languages meaning, for example, HsExpr had to have pattern related constructors e.g. EWildPat, EAsPat (and further, these propogated into other compiler phases like the renamer and typechecker). This was the case until roughly a year ago before extraordinary work by Vladislav Zavialov who solved the ambiguity resolution issue by parsing into an abstraction with an overloaded representation:

class DisambECP b where ...
newtype ECP = ECP { unECP :: forall b. DisambECP b => PV (Located b) }

This innovation might be considered to have come at a cost for developers familiar with the "old" parser however. That is, dealing with understanding the apparent complexity introduced by the ambiguity resolution system. This post attempts to provide some intuition about how the system works and hopefully will lead to the realization that it's not that hard to understand after all!

Because this post is about building intuition, there are details that are glossed over or omitted entirely: the reader is encouraged to read Vlad's detailed explanatory comments in RdrHsSyn.hs when neccessary to address that.

We start with something familiar - the GHC parser monad:

newtype P a = P { unP :: PState -> ParseResult a }

This fundamentally is a wrapper over a function PState -> ParseResult a.

The (let's call it the) "ECP system" introduces a new (and as we'll see, very related) concept. The parser validator monad:

newtype PV a = PV { unPV :: PV_Context -> PV_Accum -> PV_Result a }

So a parser validator is a function similar in spirit to a parser where:

data PV_Context: The type of essentially a wrapper around the lexer ParserFlags value;
data PV_Accum: The type of state accumulated during parsing validation (like errors & warnings , comments, annotations);
data PV_Result: The parser validator function's result type that is, data PV_Result a = PV_Ok PV_Accum a | PV_Failed PV_Accum.

Of critical interest is how this type is made a monad.

instance Functor PV where
  fmap = liftM

instance Applicative PV where
  pure a = a `seq` PV (\_ acc -> PV_Ok acc a)
  (<*>) = ap

The above reveals that an expression like return e where e is of type Located b, constructs a function that given arguments ctx and acc returns e. The moral equivalent of const.

instance Monad PV where
  m >>= f = PV $ \ctx acc ->
    case unPV m ctx acc of
      PV_Ok acc' a -> unPV (f a) ctx acc'
      PV_Failed acc' -> PV_Failed acc'

The bind operation composes PV actions threading context and accumlators through the application of their contained functions: given an m :: PV a and a function f :: a -> PV b, then m >>= f constructs a PV b that wraps a function that composes f with the function in m.

PV is a bit more than a monad, it also satisfies the MonadP class for monads that support parsing-related operations providing the ability to query for active language extensions, store warnings, errors, comments and annotations.

instance MonadP PV where
  addError srcspan msg = ....
    PV $ \ctx acc@PV_Accum{pv_messages=m} ->
      let msg' = msg $$ pv_hint ctx in
      PV_Ok acc{pv_messages=appendError srcspan msg' m} ()
  addWarning option srcspan warning = ...
  addFatalError srcspan msg =...
  getBit ext =
    PV $ \ctx acc ->
      let b = ext `xtest` pExtsBitmap (pv_options ctx) in
      PV_Ok acc $! b
  addAnnotation (RealSrcSpan l _) a (RealSrcSpan v _) = ...
  ...

The function runPV is the interpreter of a PV a. To run a PV a through this function is to produce a P a.

runPV :: PV a -> P a

That is, given a PV a construct a function PState -> ParseResult a.

runPV m =
  P $ \s ->
    let
      pv_ctx = PV_Context {...} -- init context from parse state 's'
      pv_acc = PV_Accum {...} -- init local state from parse state 's'
      -- Define a function that builds a parse state from local state
      mkPState acc' =
        s { messages = pv_messages acc'
          , annotations = pv_annotations acc'
          , comment_q = pv_comment_q acc'
          , annotations_comments = pv_annotations_comments acc' }
    in
      -- Invoke the function in m with context and state, harvest its revised state and
      -- turn its outcome into a ParseResult.
      case unPV m pv_ctx pv_acc of
        PV_Ok acc' a -> POk (mkPState acc') a
        PV_Failed acc' -> PFailed (mkPState acc')

It is often the case that a production (or set of productions) might result different ASTs depending on the context. Ideally, we just want to write the productions once and reuse them across these different sub-languages (e.g. expressions vs. commands vs. patterns). For example, the production for a parenthesized "thing" is

'(' texp ')'

In the context of a pattern we expect an AST with a ParPat _ p node whereas in the context of an expression we want an AST with an HsPar _ e node. To this end the DisambECP class embodies an abstract set of operations for parse tree construction.

class DisambECP b where
  ...

  -- | Return a command without ambiguity, or fail in a non-command context.
  ecpFromCmd' :: LHsCmd GhcPs -> PV (Located b)
  -- | Return an expression without ambiguity, or fail in a non-expression context.
  ecpFromExp' :: LHsExpr GhcPs -> PV (Located b)

  ... Lots of operations like this
  mkHsOpAppPV :: SrcSpan -> Located b -> Located (InfixOp b) -> Located b -> PV (Located b)
  mkHsVarPV :: Located RdrName -> PV (Located b)

  ...

The idea is that in the semantic actions of the grammar we construct and compose parser validators in terms of these abstract functions. Running the PVs produces parsers and at the point of execution of parsers we know the context (the nature of the AST we expect to recive) and the concrete choices for each of the abstract functions is thereby fixed (and then, on evaluation, we get the parse result).

The only wrinkle is in the return type of productions that produce parser validators. In general, they will have the form forall b. DisambECP b => PV (Located b). If they were monadic productions though we would be led to P (forall b. DisambECP b => PV (Located b) and that dog don't hunt for GHC's lack of support for impredicative types. There is a standard work-around that can be employed though. This newtype is how impredicative types in monadic productions are avoided:

newtype ECP = ECP { unECP :: forall b. DisambECP b => PV (Located b) }

So here, ECP is a wrapper around a PV (Located b) value where b can be of any type that satisifies the constraints of class DisamECP. So, in a production that looks like

| ... {% return (ECP ...)}

we are dealing with P ECP whereas without a newtype we would be dealing with P (forall b. DisambECP b => PV (Located b)).

Now to produce a P (Located b) from the PV (Located b) in an ECP we can use this expression (of type DisambECP b => ECP -> P (Located b)):

runPV (unECP p)

It takes an ECP value, projects out the parser validator contained therein and "runs" it to produce a function from PState -> ParseResult a (a parser action).

From the DisabmECP instance for HsExpr GhcPs, here's ecpFromCmd':

  ecpFromCmd' (L l c) = do
    addError l $ vcat
      [ text "Arrow command found where an expression was expected:",
        nest 2 (ppr c) ]
    return (L l hsHoleExpr)

Makes perfect sense - you get a parser validator that when evaluated will store a (non-fatal) error and returns an expression "hole" (unbound variable called _) so that parsing can continue.

Continuing, the definition of ecpFromExp':

  ecpFromExp' = return

Also sensible. Simply calculate a function that returns its provided acc argument together with the given constant expression under a PV_Ok result (see the definition of pure in the Appliciatve instance for PV given above).

Parenthesizing an expression for this DisambECP instance means wrapping a HsPar around the given e:

  mkHsParPV l e = return $ L l (HsPar noExtField e)

And so on. You get the idea.

So how does this all fit together? Consider agin the production of parenthesized things:

        | '(' texp ')'  { ECP $
                            unECP $2 >>= \ $2 ->
                            amms (mkHsParPV (comb2 $1 $>) $2) [mop $1,mcp $3] }

We note that the texp production calculates an ECP. Stripping away for simplicity the annotation and source code location calculations in the semantic action, in essence we are left with this.

ECP $ unECP $2 >>= \ $2 -> mkHsParPV $2

The effect of unECP is to project out the forall b. DisambECP b => PV (Located b) value from the result of texp. Recalling that unPV projects out the function that the PV wrapper shields and by substition of the definition of bind, we obtain roughly:

  ECP $ PV $ \ctx acc ->
                case unPV (unECP $2) ctx acc of
                  PV_Ok acc' a -> unPV (mkHsParPV a) ctx acc'
                  PV_Failed acc' -> PV_Failed acc'

The net effet is we construct a new parser validatior (function) from the parser validator (function) returned from the texp production that puts parenthesis around whatever that function when evaluated produces. If used in a context where texp generates a LPat GhcPs that'll be a ParPat node, if an LHsExpr GhcPs, then a HsPar node.

GHC: How whitespace sensitive operator lexing works

2020-04-05T15:18:00.001-04:00

How whitespace sensitive operator lexing works

In GHC, Haskell operator occurrences get classified into one of four categories. For example, the occurrence of ⊕ in a ⊕ b is "loose infix", in a⊕b is "tight infix", in a ⊕b is "prefix" and in a⊕ b, "suffix"

The point of this is that certain operators can be ascribed different meanings depending on the classification of their occurrence and language extensions that may be in effect. For example, ! when encountered will lex as strictness annotation (token type ITbang) if its occurrence is prefix (e.g. f !x = rhs) or an ordinary operator (token type ITvarsym ) if not (e.g. xs ! 3). Another ready example is provided by operator @ which, according to whitespace considerations, may be a type application (prefix), an as-pattern (tight infix), an ordinary operator (loose infix) or a parse error (suffix).

The implementation of this categorization relies upon two functions: followedByOpeningToken and precededByClosingToken. To explain further:

Identifiers, literals and opening brackets (, (#, [|, [||, [p|, [t|, { are considered "opening tokens";
Identifiers, literals and closing brackets ), #), ], |], } are considered "closing tokens";
Other tokens and whitespace are considered neither opening or closing.

The classification algorithm is defined by the following rules:

`precededByClosingToken`	`followedByOpeningToken`	occurrence
`False`	`True`	prefix
`True`	`False`	suffix
`True`	`True`	tight infix
`False`	`False`	loose infix

The implementation of precededByClosingToken is very straightforward: look backwards one character in the lexing buffer.

precededByClosingToken :: AlexAccPred ExtsBitmap
precededByClosingToken _ (AI _ buf) _ _ =
  case prevChar buf '\n' of
    '}' -> decodePrevNChars 1 buf /= "-"
    ')' -> True
    ']' -> True
    '\"' -> True
    '\'' -> True
    '_' -> True
    c -> isAlphaNum c

Similarly, followedByOpeningToken: look forwards one character in the lexing buffer.

followedByOpeningToken :: AlexAccPred ExtsBitmap
followedByOpeningToken _ _ _ (AI _ buf)
  | atEnd buf = False
  | otherwise =
      case nextChar buf of
        ('{', buf') -> nextCharIsNot buf' (== '-')
        ('(', _) -> True
        ('[', _) -> True
        ('\"', _) -> True
        ('\'', _) -> True
        ('_', _) -> True
        (c, _) -> isAlphaNum c

Armed by these rules, the lexing of operators looks like this:

<0> {
  @varsym / { precededByClosingToken `alexAndPred` followedByOpeningToken } { varsym_tight_infix }
  @varsym / { followedByOpeningToken }  { varsym_prefix }
  @varsym / { precededByClosingToken }  { varsym_suffix }
  @varsym                               { varsym_loose_infix }
}

The actions varsym_tight_infix, varsym_prefix, varsym_suffix and varsym_loose_infix are "fed" the operator and allow for language extension specific issuance of tokens (as opposed to issuance of general ITvarsym tokens). For example, varsym_prefix :

varsym_prefix :: Action
varsym_prefix = sym $ \exts s ->
  if | TypeApplicationsBit `xtest` exts, s == fsLit "@"
     -> return ITtypeApp
     |  ...
     | otherwise -> return (ITvarsym s)

GHC Haskell Pats and LPats

2020-03-01T07:34:00.000-05:00

GHC Haskell Pats and LPats

In the Trees that Grow paper, it is explained that GHC has a single data type HsSyn that crosses several compiler phases; a second data type TH.Syntax for Template Haskell and that other Haskell libraries e.g. haskell-src-exts defnining yet others. Ideally, HsSyn would be reused in Template Haskell and these third-party libraries and motivates the flexibilities offered by the TTG (Trees That Grow) techniques.

Before GHC 8.8, patterns and located patterns were related in the following way:

type LPat = Located Pat
data Pat p
  = ...
  | LazyPat (XLazyPat p) (LPat p)
  ...

That is, patterns with locations are represented by values of type LPat and patterns themselves as values of type Pat. Note that LPat values contain Pat values which in turn can contain LPat values hence the name "ping pong style" being given to this idiom.

Since location annotations may (e.g. GHC native) or may not (e.g. Template Haskell) be present for a given application it is realized that "baking" locations into HsSyn is undesirable. For this reason, in 8.8 attempts were made to make their presence a strictly GHC "thing" in the following way:

type LPat p = Pat p
data Pat p
  = ...
  | LazyPat (XLazyPat p) (LPat p)
  | ...
  | XPat (XXPat p)
type instance XXPat (GhcPass p) = Located (Pat (GhcPass p))

That is, in GHC under this approach, locations are stored in the extension constructor - patterns with locations are wrapped in XPat e.g. XPat noExt (L _ (VarPat noExt _)). Of course, now, to get at the location you have to go through an indirection through XPat. For this, the functions cL and dL (and the bi-directional pattern synonym LL) were provided. Applications that don't want locations in the parse tree just don't make use of the XPat constructor.

It turned out that the 8.8 approach wasn't as good an idea as it seemed; it was a bit more complicated than it needed to be and had some unexpected implications for the existing GHC source code base. It was realized that this following alternative approach yields the same benefits and is what we find in 8.10 and beyond:

type family XRec p (f :: * -> *) = r | r -> p f
type instance XRec (GhcPass p) f = Located (f (GhcPass p))

type LPat p = XRec p Pat
data Pat p
  = ...
  | LazyPat (XLazyPat p) (LPat p)
  | ...
  | XPat (XXPat p)
type instance XXPat   (GhcPass _) = NoExtCon

Thus for GHC, ping-pong style is restored and applications other than GHC can define the XRec instance as simply f p so that locations are absent.

In practical terms, going from 8.8 to 8.10 LL becomes L, dL -> is removed and cL is just L.

Partitions of a set

2019-08-11T11:43:00.000-04:00

Calculating the partitions of a set

Having "solved" a bunch of these divide & conquer problems, I'm the first to admit to having being lulled into a false sense of security. At first glance, the problem of this post seemed deceptively simple and consequently I struggled with it, sort of "hand-waving", not really engaging my brain and getting more and more frustrated how the dang thing wouldn't yield to my experience! I think the moral of the story is math doesn't care about your previous successes and so don't let your past practice trick you into laziness. Be guided by your experience but fully apply yourself to the problem at hand!

Suppose a set of two elements {2, 3}. There are only two ways it can be partitioned: (23), (3)(2). For meaning, you might think of these two partitions like this : in the first partition, there is a connection between the elements 2 and 3, in the second, 2 and 3 are isolated from each other.

Suppose a set of elements {1, 2, 3}. There are five partitions of this set : (123), (23)(1), (13)(2), (3)(21), (3)(2)(1) (I've carefully written them out this way to help with the elucidation). Maybe you want to break here and see if you can write an algorithm for calculating them before reading on?

Observe that we can get the partitions of {1, 2, 3} from knowledge of the partitions of {2, 3} by looking at each partition of {2, 3} in turn and considering the partitions that would result by inclusion of the element 1. So, for example, the partition (23) gives rise to the partitions (123) and (23)(1). Similarly, the partition (3)(2) gives rise to the partitions (13)(2), (3)(21) and (3)(2)(1). We might characterize this process as computing new partitions of {1, 2, 3} from a partition p of {2, 3} as "extending" p .

Suppose then we write a function extend x p to capture the above idea. Let's start with the signature of extend. What would it be? Taking (23)(1) as an exemplar, we see that a component of a partition can be represented as [a] and so a partition itself then as [[a]]. We know that extend takes an element and a partition and returns a list of (new) partitions so it must have signature extend :: a -> [[a]] -> [[[a]]] (yes, lists of lists of lists are somehow easy to get confused about).

Now for writing the body of extend. The base case is the easiest of course - extending the empty partition:

extend x [] = [[[x]]]

That is, a singleton list of partitions where that one partition has one component. The inductive case is the partition obtained by "pushing" x into the first component of p together with the extensions that leave the first component of p alone.

extend x (h : tl) = ((x : h) : tl) : map (h :) (extend x tl)

We can now phrase the function partition with signature partition :: [a] -> [[[a]]] like this:

partition [] = [[]]
partition (h : tl) = concatMap (extend h) (partition tl)

The base case says, the only partition of the empty set is the the empty partition.

Wrapping it all up, the algorithm in entirety is

partition :: [a] -> [[[a]]]
partition [] = [[]]
partition (h : tl) = concatMap (extend h) (partition tl)
  where
    extend :: a -> [[a]] -> [[[a]]]
    extend x [] = [[[x]]]
    extend x (h : tl) = ((x : h) : tl) : map (h :) (extend x tl)

Build GHC with stack and hadrian

2019-06-29T14:40:00.002-04:00

Building GHC with stack and hadrian

By far the easiest way I know of to get a build of GHC is via the tools 'stack' and 'hadrian'*. The procedures below set out commands that I know first hand work** with machines provisioned by the CI systems Azure, Travis and Appveyor.

Setup

Ubuntu:

curl -sSL https://get.haskellstack.org/ | sh
stack setup

macOS:

/usr/bin/ruby -e \
  "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install autoconf automake gmp
curl -sSL https://get.haskellstack.org/ | sh
stack setup

Windows:

curl -sSL https://get.haskellstack.org/ | sh
stack setup
stack exec -- pacman -S autoconf automake-wrapper make patch python tar \
      --noconfirm

Build

Ubuntu & macOS:

git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
cd ghc
hadrian/build.stack.sh --configure --flavour=quickest -j

Windows:

git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
cd ghc
hadrian/build.stack.bat --configure --flavour=quickest -j

[*] The simplicitly and uniformity of these commands make me an advocate of these tools and in particular, the hadrian --configure flag.

[**] Well, that is to say mostly work. The above is the ideal and has worked me for me reliably for the last year. Recently though, for one reason or another, there seem to have been a lot of breakages. Your mileage may vary.

Harvesting annotations from the GHC parser

2019-06-28T13:58:00.002-04:00

Harvesting annotations from the GHC parser

My last post on parsing in the presence of dynamic pragmas left us with this outline for calling the GHC parser.

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file flags s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m

Now, it's a fact that you'll not find in a GHC parse tree certain things like comments and the location of keywords (e.g. let, in and so on). Certainly, if you're writing refactoring tools (think programs like Neil Mitchell's awesome hlint for example), access to these things is critical!

So, how does one go about getting these program "annotations"? You guessed it... there's an API for that.

If we assume the existence of a function analyzeModule :: DynFlags -> Located (HsModule GhcPs) -> ApiAnns -> IO () then, here's the gist of the code that exercises it:

            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)

Here harvestAnns is defined as

    harvestAnns pst =
      ( Map.fromListWith (++) $ annotations pst
      , Map.fromList ((noSrcSpan, comment_q pst) : annotations_comments pst)
      )

The type ApiAnns is a pair of maps : the first map contains keyword and punctuation locations, the second maps locations of comments to their values.

You might think that's the end of this story but there's one twist left : the GHC lexer won't harvest comments by default - you have to tell it to do so by means of the Opt_KeepRawTokenStream (general) flag (see the GHC wiki for details)!

Taking the above into account, to parse with comments, the outline now becomes:

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file (flags `gopt_set` Opt_KeepRawTokenStream)s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)

For a complete program demonstrating all of this see this example in the ghc-lib repo.

Have GHC parsing respect dynamic pragmas

2019-06-02T11:57:00.000-04:00

Have GHC parsing respect dynamic pragmas

This post about Handling GHC parse errors shows that using qualified in postpostive position is a syntax error unless the ImportQualifiedPost language extension is enabled. In that post, it is explained that the program

module M where
import Data.List qualified

is invalid whereas,

{#- LANGUAGE ImportQualifiedPost -#}
module M where
import Data.List qualified

which enables the extension via a "dynamic pragma", is legit.

Perhaps surprisingly, running the second of these programs through the parsing code presented in that post continues to generate the error

     Found `qualified' in postpositive position.
     To allow this, enable language extension 'ImportQualifiedPost'

Evidently, our parse-fu needs an upgrade to respect dynamic pragmas and that's what this post provides.

This code exercises the GHC API to parse a module.

parse :: String -> DynFlags -> String -> ParseResult (Located (HsModule GhcPs))
parse filename flags str =
  unP Parser.parseModule parseState
  where
    location = mkRealSrcLoc (mkFastString filename) 1 1
    buffer = stringToStringBuffer str
    parseState = mkPState flags buffer location

Note in the above, the second argument flags :: DynFlags. In order for parse to take into account extensions enabled by pragmas in the source argument str, then flags must be set up to do so a priori. That is, before jumping into parse, a "first pass" must be made to sniff out flags. There is a GHC API for that. It's called parseDynamicFilePragma.

Here's a function to harvest flags from pragmas that makes that call to parseDynamicFilePragma.

parsePragmasIntoDynFlags :: DynFlags -> FilePath -> String -> IO (Maybe DynFlags)
parsePragmasIntoDynFlags flags filepath str =
  catchErrors $ do
    let opts = getOptions flags (stringToStringBuffer str) filepath
    (flags, _, _) <- parseDynamicFilePragma flags opts
    return $ Just flags
  where
    catchErrors :: IO (Maybe DynFlags) -> IO (Maybe DynFlags)
    catchErrors act = handleGhcException reportErr
                        (handleSourceError reportErr act)
    reportErr e = do putStrLn $ "error : " ++ show e; return Nothing

The main contribution of this function is to account for the complication that parseDynamicFilePragma can throw two kinds of exceptions : GhcException and SourceError. The GHC API functions handleGhcException and handleSourceError are the means to achieve that.

Putting it all together then, here's an outline of how to parse in the presence of dynamic pragmas.

      s <- readFile' file
      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file flags s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m

For a complete working program that utilizes this function, see this example in the ghc-lib repo.

Handling GHC parser errors right

2019-05-10T22:03:00.000-04:00

Handling GHC parser errors right

Did you know, a POk parse result from the GHC parser doesn't necessarily mean the parse was OK? This blog explains what's up with that. The source code below is from this example in the ghc-lib repo.

Here is code that tries to make a parse tree of a Haskell module.

parse :: String -> DynFlags -> String -> ParseResult (Located (HsModule GhcPs))
parse filename flags str =
  unP Parser.parseModule parseState
  where
    location = mkRealSrcLoc (mkFastString filename) 1 1
    buffer = stringToStringBuffer str
    parseState = mkPState flags buffer location

The way to call the above code is like this.

case parse file flags s of
  PFailed s ->
    report flags $ snd (getMessages s flags)
  POk s m -> do
    report flags $ fst (getMessages s flags)
    analyzeModule flags m

In the PFailed s case (where s is the parse state), the expression snd (getMessages s flags) retrieves the errors and we report them. In the POk case, we report warnings and do whatever it is we wanted to do with the parse tree m right?

Not quite. The problem is that the parser produces two sorts of errors : "fatal" and "non-fatal". Thus far, we have only considered the "fatal" ones.

Fatal errors are such that production of a parse tree is impossible. Non-fatal parse errors are those that don't prevent construction of a parse tree. A parse that generates non-fatal errors is going to associate with a parse tree in some way non-conforming to the Haskell language specification.

The right way to write the POk case is like this.

POk s m -> do
  let (warns, errs) = getMessages s flags
  report flags warns
  report flags errs
  when (null errs) $ analyzeModule flags m

The key point is analyzeModule is called only if there are absolutely no parse errors at all.

A non-fatal error example is provided by the ImportQualifiedPost language extension (see this post for how to add a GHC language extension). Specifically, it is only legal to write import M qualified if the extension is in effect via pragma or the option -XImportQualifiedPost. In the event this syntax is used when the extension is not in effect, the user should see an error like

 test/MiniHlintTest_non_fatal_error.hs:6:18: error:
     Found `qualified' in postpositive position.
     To allow this, enable language extension 'ImportQualifiedPost'

and further analysis of the parse abandoned.

Announcing ghc-lib 0.20190404

2019-04-07T09:20:00.000-04:00

Announcing ghc-lib 0.20190404

On behalf of Digital Asset I am excited to share with you the latest release of ghc-lib.

As described in detail in the ghc-lib README, the ghc-lib project lets you use the GHC API for parsing and analyzing Haskell code without tying you to a specific GHC version.

What's new

The GHC source code in this release is updated to GHC HEAD as of April the 4^th, 2019. Accordingly, the mini-hlint example in the ghc-lib repo was adjusted to accomodate GHC API changes to the ParseResult datatype and parser error handling.

By far the biggest change though is this : the ghc-lib project now provides two packages, ghc-lib-parser and ghc-lib. The packages are released on Hackage, and can be installed as usual e.g. cabal install ghc-lib.

Some projects don't require the ability to compile Haskell to GHC's Core language. If lexical analysis alone is sufficient for your project's needs, then the ghc-lib-parser package alone will do for that. The build time for ghc-lib-parser is roughly half of the combined build times of ghc-lib-parser and ghc-lib. That is, in this case, switching to the new release will decrease the build time for your project. Note that if your project does require compiling Haskell to Core, then your project will now depend on both the ghc-lib-parser and ghc-lib packages.

The ghc-lib package "re-exports" the modules of the ghc-lib-parser package. So, if you depend upon the ghc-lib package, you'll get the ghc-lib-parser modules "for free". Sadly though, at this time, package import syntax (and we do recommend using package import syntax for these packages) doesn't quite work like you'd expect so that if you, import "ghc-lib" DynFlags for example, this will fail because DynFlags is in fact in the ghc-lib-parser package. In this case, you'd write, import "ghc-lib-parser" DynFlags and all would be well. The mini-compile example in the ghc-lib repo demonstrates mixing modules from both packages.

Digital Asset make extensive use of the ghc-lib packages in the DAML smart contract language compiler and hope you continue to benefit from this project too!

Bush fixing Travis and GitLab

2019-03-20T13:42:00.000-04:00

Bush fixing Travis and CI

Ever had one of those days?

You are not alone!

This Saturday 9th March 2019, the GHC devs are going to announce that git://git.haskell.org/ghc.git has been decommissioned. The new official upstream GHC will be https://gitlab.haskell.org/ghc/ghc.

Sadly (for us) this broke ghc-lib CI's Travis linux configuration.

What does our CI do? The ghc-lib CI script pulls down the latest GHC sources and builds and tests them as a ghc-lib. The details of the problem are that Travis gives you a broken Ubuntu where cloning the official URL fails with a TLS “handshake error”. More generally, any Travis job that tries to git clone over the https protocol from a GitLab remote will fail the same way.

This .travis.yml shows a workaround. The idea is to spin up a container before install that doesn’t have this problem and clone from there. The essential bits are:

services:
- docker

# [Why we git clone on linux here]
# At this time, `git clone https://gitlab.haskell.org/ghc/ghc.git`
# from within `CI.hs` does not work on on linux. This appears to be a
# known Travis/ubuntu SSL verification issue. We've tried many less
# drastic workarounds. This grand hack is the only way we've found so
# far that can be made to work.
before_install:
- |
    if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
      docker pull alpine/git
      docker run -ti --rm -v ${HOME}:/root -v $(pwd):/git \
        alpine/git clone https://gitlab.haskell.org/ghc/ghc.git /git/ghc --recursive
    fi

Note, MacOS docker services aren’t supported but that’s OK! The TLS handshake problem doesn’t exhibit in that configuration.

Update : It turns out that while this issue exists in Ubuntu 14.04 which Travis uses by default, it is “fixed” in Ubuntu 16.04. So by writing dist: xenial in your .travis.yml file, the above workaround can be avoided.

Adding a GHC Language Extension

2019-02-23T14:28:00.000-05:00

Adding a GHC Language Extension

This note summarizes the essential mechanics of adding a new language extension to GHC. The example code will illustrate adding a Foo extension.

Implementing the extension

The first step is to add a Foo constructor to the Extension type in libraries/ghc-boot-th/GHC/LanguageExtensions/Type.hs.

data Extension
    = Cpp
    | OverlappingInstances
    ...
    | Foo

The next job is to extend xFlagsDeps in compiler/main/DynFlags.hs.

xFlagsDeps = [
  flagSpec "AllowAmbiguousTypes" LangExt.AllowAmbiguousTypes,
  ...
  flagSpec "Foo"                 LangExt.Foo
]

That's all it takes. With these two changes, it is now possible to enable Foo in Haskell source files by writing {-# LANGUAGE Foo #-} or from a compile command by passing the argument -XFoo.

Testing for the extension

Lexer

In compiler/parser/Lexer.x, locate data ExtBits and add a constructor for Foo.

data ExtBits
  = FfiBit
  | ...
  | FooBit

Next, extend the where clause of function mkParserFlags' with a case for Foo.

langExtBits =
        FfiBit `xoptBit` LangExt.ForeignFunctionInterface
    .|. InterruptibleFfiBit `xoptBit` LangExt.InterruptibleFFI

    ...

    .|. FooBit `xoptBit` LangExt.FooBit

The function xtest is then the basic building block for testing if Foo is enabled. For example, this specific function tests a bitmap for the on/off status of the Foo bit.

fooEnabled :: ExtsBitMap -> Bool
fooEnabled = xtest FooBit

In practice, testing for a language extension in the lexer is called from a function computing a lexer action. Suppose foo to be such a function and the action it computes depends somehow on whether the Foo language extension is in effect. Putting it all together, schematically it will have the following form.

foo :: (FastString -> Token) -> Action
foo con span buf len = do
    exts <- getExts
    if FooBit `xtest` exts then
       ...
    else
       ...

Parser

This utility computes a monadic expression testing for the on/off state of a bit in a parser state monad.

extension :: (ExtsBitmap -> Bool) -> P Bool
extension p = P $ \s -> POk s (p $! (pExtsBitmap . options) s)

An expression of this kind can be evaluated in the semantic action of a parse rule in compiler/parser/Parser.y. Here's an example of how one might be used.

foo :: { () }
  : 'foo'  {}
  | {- empty -}    {% do
                         foo_required <- extension fooEnabled
                         when foo_required $ do
                           loc <- fileSrcSpan
                           parseErrorSDoc loc $ text "Missing foo"
                    }

Renaming, type-checking and de-sugaring

All of renaming, typechecking and desurgaring occur in the contexts of TcRnIf _ _ monads. Function xoptM :: Extension -> TcRnIf gbl lcl Bool is provided for extension testing in such contexts. Here's a schematic of how such a test might be used in a renaming function.

import GHC.LanguageExtensions

updateFoos :: [AvailInfo] -> RnM (TcGlbEnv, TcLclEnv)
updateFoos info = do
  (globals, locals) <- getEnvs
  opt_Foo <- xoptM Foo
  if not opt_Foo then
    return (globals, locals)
  else
    let globals' = ...
        locals' = ...
    return (globals', locals')

Bucket Sort

2018-06-10T14:29:00.000-04:00

Bucket Sort

Bucket sort assumes input generated by a random process that distributes elements uniformly over the interval [0, 1).

The idea of bucket sort is to divide [0, 1) into n equal-sized subintervals or buckets, and then distribute the n input numbers into the buckets. To produce the output, sort the numbers in each bucket and then go through the buckets in order. Sorting a bucket can be done with insertion sort.

let rec insert x = function
  | [] -> [x]
  | h :: tl as ls ->
    if x < h then x :: ls else h :: insert x tl

let rec insertion_sort = function
  | [] | [_] as ls -> ls
  | h :: tl -> insert h (insertion_sort tl)

This code for bucket sort assumes the input is an n-element array a and that each element 0 ≤ a.(i) < 1. The code requires an auxillary array b.(0 .. n - 1) of lists (buckets).

let bucket_sort a =
  let n = Array.length a in
  let b = Array.make n [] in
  Array.iter
    (fun x ->
       let i =
         int_of_float (
           floor (float_of_int n *. x)
         ) in
        Array.set b i (x :: Array.get b i)
      ) a;
  Array.iteri
    (fun i l ->
       Array.set b i (insertion_sort l)
    ) b;
  Array.fold_left (fun acc bucket -> acc @ bucket) [] b
;;
bucket_sort [| 0.78; 0.17; 0.39; 0.26; 0.72; 0.94
             ; 0.21; 0.12; 0.23; 0.68|]

Bucket sort runs in linear time on the average.

References:
[1] "Introduction to Algorithms" Section 9.4:Bucket Sort -- Cormen et. al. (Second ed.) 2001.

Dijkstra's algorithm

2018-05-20T14:26:00.000-04:00

Shortest Path

This article assumes familiarity with Dijkstra's shortest path algorithm. For a refresher, see [1]. The code assumes open Core is in effect and is online here.

The first part of the program organizes our thoughts about what we are setting out to compute. The signature summarizes the notion (for our purposes) of a graph definition in modular form. A module implementing this signature defines a type vertex_t for vertices, a type t for graphs and type extern_t : a representation of a t for interaction between an implemening module and its "outside world".

module type Graph_sig = sig
  type vertex_t [@@deriving sexp]
  type t [@@deriving sexp]
  type extern_t

  type load_error = [ `Duplicate_vertex of vertex_t ] [@@deriving sexp]
  exception Load_error of load_error [@@deriving sexp]

  val of_adjacency : extern_t -> [ `Ok of t | `Load_error of load_error ]
  val to_adjacency : t -> extern_t

  module Dijkstra : sig
    type state

    type error = [
      | `Relax of vertex_t
    ] [@@deriving sexp]
    exception Error of error [@@deriving sexp]

    val dijkstra : vertex_t -> t -> [ `Ok of state | `Error of error ]
    val d : state -> (vertex_t * float) list
    val shortest_paths : state -> (vertex_t * vertex_t list) list
  end

end

A realization of Graph_sig provides "conversion" functions of_adjacency/to_adjacency between the types extern_t and t and nests a module Dijkstra. The signature of the sub-module Dijkstra requires concrete modules provide a type state and an implementation of Dijkstra's algorithm in terms of the function signature val dijkstra : vertex_t -> t -> [ `Ok of state | `Error of error ].

For reusability, the strategy for implementing graphs will be generic programming via functors over modules implementing s vertex type.

An implementation of the module type GRAPH defines a module type VERT which is required to provide a comparable type t. It further defines a module type S that is exactly module type Graph_sig above. Lastly, modules of type GRAPH provide a functor Make that maps any module of type VERT to new module of type S fixing extern_t to an adjacency list representation in terms of the native OCaml type 'a list and float to represent weights on edges.

module type GRAPH = sig
  module type VERT = sig
    type t[@@deriving sexp]
    include Comparable.S with type t := t
  end

  module type S = sig
    include Graph_sig
  end

  module Make : functor (V : VERT) ->
    S with type vertex_t = V.t
       and type extern_t = (V.t * (V.t * float) list) list
end

The two module types Graph_sig and GRAPH together provide the specification for the program. module Graph in the next section implements this specification.

Implementation of module Graph is in outline this.

module Graph : GRAPH = struct
  module type VERT = sig
    type t[@@deriving sexp]
    include Comparable.S with type t := t
  end

  module type S = sig
    include Graph_sig
  end

  module Make : functor (V : VERT) ->
    S with type vertex_t = V.t
       and type extern_t = (V.t * (V.t * float) list) list
    =

    functor (V : VERT) -> struct
       ...
    end
end

As per the requirements of GRAPH the module types VERT and S are provided as is the functor Make. It is the code that is ellided by the ... above in the definition of Make that is now the focus.

Modules produced by applications of Make satisfy S. This requires suitable definitions of types vertext_t, t and extern_t. The modules Map and Set are available due to modules of type VERT being comparable in their type t.

      module Map = V.Map
      module Set = V.Set

      type vertex_t = V.t [@@deriving sexp]
      type t = (vertex_t * float) list Map.t [@@deriving sexp]
      type extern_t = (vertex_t * (vertex_t * float) list) list
      type load_error = [ `Duplicate_vertex of vertex_t ] [@@deriving sexp]
      exception Load_error of load_error [@@deriving sexp]

While the external representation extern_t of graphs is chosen to be an adjacency list representation in terms of association lists, the internal representation t is a vertex map of adjacency lists providing logarithmic loookup complexity. The conversion functions between the two representations "come for free" via module Map.

      let to_adjacency g = Map.to_alist g

      let of_adjacency_exn l =  match Map.of_alist l with
        | `Ok t -> t
        | `Duplicate_key c -> raise (Load_error (`Duplicate_vertex c))

      let of_adjacency l =
        try
          `Ok (of_adjacency_exn l)
        with
        | Load_error err -> `Load_error err

At this point the "scaffolding" for Dijkstra's algorithm, that part of GRAPH dealing with the representation of graphs is implemented.

The interpretation of Dijkstra's algorithm we adopt is functional : the idea is we loop over vertices relaxing their edges until all shortest paths are known. What we know on any recursive iteration of the loop is a current "state" (of the computation) and each iteration produces a new state. This next definition is the formal definition of type state.

      module Dijkstra = struct

        type state = {
          src    :                  vertex_t
        ; g      :                         t
        ; d      :               float Map.t
        ; pred   :            vertex_t Map.t
        ; s      :                     Set.t
        ; v_s    : (vertex_t * float) Heap.t
        }

The fields of this record are:

src : vertex_t, the source vertex;
g : t, G the graph;
d : float Map.t, d the shortest path weight estimates;
pre : vertex_t Map.t, π the predecessor relation;
s : Set.t, the set S of nodes for which the lower bound shortest path weight is known;
v_s : (vertex_t * float) Heap.t, V - {S}, , the set of nodes of g for which the lower bound of the shortest path weight is not yet known ordered on their estimates.

Function invocation init src g compuates an initial state for the graph g containing the source node src. In the initial state, d is everywhere ∞ except for src which is 0. Set S (i.e. s) and the predecessor relation π (i.e. pred) are empty and the set V - {S} (i.e. v_s) contains all nodes.

        let init src g =
          let init x = match V.equal src x with
            | true -> 0.0 | false -> Float.infinity in
          let d = List.fold (Map.keys g) ~init:Map.empty
              ~f:(fun acc x -> Map.set acc ~key:x ~data:(init x)) in
          {
            src
          ; g
          ; s = Set.empty
          ; d
          ; pred = Map.empty
          ; v_s = Heap.of_list (Map.to_alist d)
                ~cmp:(fun (_, e1) (_, e2) -> Float.compare e1 e2)
          }

Relaxing an edge (u, v) with weight w (u, v) tests whether the shortest path to v so far can be improved by going through u and if so, updating d (v) and π (v) accordingly.

        type error = [
          | `Relax of vertex_t
        ] [@@deriving sexp]
        exception Error of error [@@deriving sexp]

        let relax state (u, v, w) =
          let {d; pred; v_s; _} = state in
          let dv = match Map.find d v with
            | Some dv -> dv
            | None -> raise (Error (`Relax v)) in
          let du = match Map.find d u with
            | Some du -> du
            | None -> raise (Error (`Relax u)) in
          if dv > du +. w then
            let dv = du +. w in
            (match Heap.find_elt v_s ~f:(fun (n, _) -> V.equal n v) with
            | Some tok -> ignore (Heap.update v_s tok (v, dv))
            | None -> raise (Error (`Relax v))
            );
            { state with
              d = Map.change d v
                  ~f:(function
                      | Some _ -> Some dv
                      | None -> raise (Error (`Relax v))
                    )
            ; pred = Map.set (Map.remove pred v) ~key:v ~data:u
            }
          else state

Here, relaxation can result in a linear heap update operation. A better implementation might seek to avoid that.

One iteration of the body of the loop of Dijkstra's algorithm consists of the node in V - {S} with the least shortest path weight estimate being moved to S and its edges relaxed.

        let dijkstra_exn src g =
          let rec loop ({s; v_s; _} as state) =
            match Heap.is_empty v_s with
            | true -> state
            | false ->
              let u = fst (Heap.pop_exn v_s) in
              loop (
                List.fold (Map.find_exn g u)
                  ~init:{ state with s = Set.add s u }
                  ~f:(fun state (v, w) -> relax state (u, v, w))
              )
          in loop (init src g)

        let dijkstra src g =
          try
            `Ok (dijkstra_exn src g)
          with
          | Error err -> `Error err

The shortest path estimates contained by a value of state is given by the projection d.

        let d state = Map.to_alist (state.d)

The shortest paths themselves are easily computed as,

   let path state n =
          let rec loop acc x =
            (match V.equal x state.src with
            | true -> x :: acc
            | false -> loop (x :: acc) (Map.find_exn state.pred x)
            ) in
          loop [] n

        let shortest_paths state =
          List.map (Map.keys state.g) ~f:(fun n -> (n, path state n))
      end
    end

which completes the implementation of Make.

The following program produces a concrete instance of the shortest path problem (with some evaluation output from the top-level).

module G : Graph.S with
  type vertex_t = char and type extern_t = (char * (char * float) list) list
  =
  Graph.Make (Char)

let g : G.t =
  match G.of_adjacency
          [ 's', ['u',  3.0; 'x', 5.0]
          ; 'u', ['x',  2.0; 'v', 6.0]
          ; 'x', ['v',  4.0; 'y', 6.0; 'u', 1.0]
          ; 'v', ['y',  2.0]
          ; 'y', ['v',  7.0]
          ]
  with
  | `Ok g -> g
  | `Load_error e -> failwiths "Graph load error : %s" e G.sexp_of_load_error
;;
let s = match (G.Dijkstra.dijkstra 's' g) with
  | `Ok s -> s
  | `Error e -> failwiths "Error : %s" e G.Dijkstra.sexp_of_error

;; G.Dijkstra.d s
- : (char * float) list =
[('s', 0.); ('u', 3.); ('v', 9.); ('x', 5.); ('y', 11.)]

;; G.Dijkstra.shortest_paths s
- : (char * char list) list =
[('s', ['s']); ('u', ['s'; 'u']); ('v', ['s'; 'u'; 'v']); ('x', ['s'; 'x']);
 ('y', ['s'; 'x'; 'y'])]

References:
[1] "Introduction to Algorithms" Section 24.3:Dijkstra's algorithm -- Cormen et. al. (Second ed.) 2001.

How to migrate your ppx to OCaml migrate parsetree

2017-12-09T10:40:00.000-05:00

OCaml migrate parse tree

Earlier this year, this blog post [2] explored the implementation of a small preprocessor extension (ppx).

The code of the above article worked well enough at the time but as written, exhibits a problem : new releases of the OCaml compiler are generally accompanied by evolutions of the OCaml parse tree. The effect of this is, a ppx written against a specific version of the compiler will "break" in the presence of later releases of the compiler. As pointed out in [3], the use of ppx's in the OCaml eco-system these days is ubiquitous. If each new release of the OCaml compiler required sychronized updates of each and every ppx in opam, getting new releases of the compiler out would soon become a near impossibilty.

Mitigation of the above problem is provided by the ocaml-migrate-parsetree library. The library provides the means to convert parsetrees from one OCaml version to another. This allows the ppx rewriter to write against a specific version of the parsetree and lets the library take care of rolling parsetrees backwards and forwards in versions as necessary. In this way, the resulting ppx is "forward compatible" with newer OCaml versions without requiring ppx code updates.

To get the ppx_id_of code from the earlier blog post usable with ocaml-migrate-parsetree required a couple of small tweaks to make it OCaml 4.02.0 compatible. The changes from the original code were slight and not of significant enough interest to be worth presenting here. What is worth looking at is what it then took to switch the code to use ocaml-migrate-parsetree. The answer is : very little!

open Migrate_parsetree
open OCaml_402.Ast

open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident

(* The original ppx as written before goes here!
   .                    .                   .
   .                    .                   .
   .                    .                   .
*)

let () = Driver.register ~name:"id_of" (module OCaml_402) id_of_mapper

The complete code for this article is available online here and as a bonus, includes a minimal jbuilder build system demonstrating just how well the OCaml tool-chain comes together these days.

References:
[1] "A Guide to Extension Points in OCaml" -- Whitequark (blog post 2014)
[2] "Preprocessor extensions for code generation" -- Shayne Fletcher (blog post 2017)
[3] "Extension Points - 3 Years Later" -- Rudi Grinberg (blog post 2017)

Towers of Hanoi

2017-11-11T09:32:00.001-05:00

Towers of Hanoi

The "towers of Hanoi" problem is stated like this. There are three pegs labelled a, b and c. On peg a there is a stack of n disks of increasing size, the largest at the bottom, each with a hole in the middle to accomodate the peg. The problem is to transfer the stack of disks to peg c, one disk at a time, in such a way as to ensure that no disk is ever placed on top of a smaller disk.

The problem is amenable to a divide and conquer strategy : "Move the top n - 1 disks from peg a to peg b, move the remaining largest disk from peg a to peg c then, move the n - 1 disks on peg b to peg c."

let rec towers n from to_ spare =
  if n > 0 then
    begin
      towers (n - 1) from spare to_;
      Printf.printf  
               "Move the top disk from peg %c to peg %c\n" from to_;
      towers (n - 1) spare to_ from
    end
else
  ()
;;

For example, the invocation

let () =
towers
3 'a' 'c' 'b'

will generate the recipie

Move the top disk from peg a to peg c
Move the top disk from peg a to peg b
Move the top disk from peg c to peg b
Move the top disk from peg a to peg c
Move the top disk from peg b to peg a
Move the top disk from peg b to peg c
Move the top disk from peg a to peg c

Let T(n) be the time complexity of towers (x, y, z), when the characteristic operation is the moving of a disk from one peg to another. The time complexity of towers(n - 1, x, y z) is T(n - 1) by definition and no further investigation is needed. T(0) = 0 because the test n > 0 fails and no disks are moved. For larger n, the expression towers (n - 1, from, spare, to_) is evaluated with cost T(n - 1) followed by Printf.printf "Move the top disk from peg %c to peg %c\n" from to_ with cost 1 and finally, towers(n - 1, spare, to_, from) again with cost T(n - 1).

Summing these contributions gives the recurrence relation T(n) = 2T(n - 1) + 1 where T(0) = 0.

Repeated substituition can be used to arrive at a closed form for T(n), since, T(n) = 2T(n - 1) + 1 = 2[2T(n - 2) + 1] + 1 = 2[2[2T(n - 3) +1] + 1] + 1 = 2³T(n - 3) + 2² + 2¹ + 2⁰ (provided n ≥ 3), expanding the brackets in a way that elucidates the emerging pattern. If this substitution is repeated i times then clearly the result is T(n) = 2ⁱT(n - i) + 2^{i - 1} + 2^{i - 2} + ··· + 2⁰ (n ≥ i). The largest possible value i can take is n and if i = n then T(n - i) = T(0) = 0 and so we arrive at T(n) = 2ⁿ0 + 2^{n - 1} + ··· + 2⁰. This is the sum of a geometric series with the well known solution 2ⁿ - 1 (use induction to establish that last result or more directly, just compute 2T(n) - T(n)). And so, the time complexity (the number of disk moves needed) for n disks is T(n) = 2ⁿ - 1.

References:
Algorithms and Data Structures Design, Correctness, Analysis by Jeffrey Kingston, 2nd ed. 1998

Nesting quoted strings in OCaml

2017-10-27T20:53:00.001-04:00

Quoting

According to the lexical conventions of OCaml, characters different from \ and " can be enclosed in single quotes and appear in strings. The special characters \ and " are represented in these contexts by their escape sequences. The escape sequence \\ denotes the character \ and \" denotes the character ".

Here we print the string "Hello world!". The quotes delimit the string and are not themselves part of the string.

utop[0]> Caml.Printf.printf "Hello world!";;
Hello world!- : unit = ()

To capture the quotes we need to write them into the string by their escape sequence.

utop[1]> Caml.Printf.printf "\"Hello world!\"";;
"Hello world!"- : unit = ()

What now if we wish to quote a string within a string?

utop[3]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested quoted string\\\"\"";;
"A quoted string with \"a nested quoted
string\""- : unit = ()

We see that in rendering the above string, printf has rendered the escape sequence \" as " and \\\" as \" as required. The pattern continues if we now wish to quote a string within a quoted string within a quoted string.

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()

As you can see, things get crazy pretty quickly and you can easily drive yourself mad working out the correct escape sequences to get the desired nesting!

Here's a hack : If the string has k levels of quoting, then count how many occurences of \s precede the " at that level. Let that number be n say. To get the next level of quoting you need to concatenate a sequence of n + 1 \s to them to get a total of 2n + 1 \s. To illustrate, look again at the last example:

utop[4]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested\\\\\\\"
quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested\\\"
quoted string\""- : unit = ()

That's three level of quoting. At the third level we have the sequence \\\\\\\". That's 7 \s. To quote to the fourth level then we need 8 + 7 = 15 \s:

utop[5]> Caml.Printf.printf 
"\"A quoted string with \\\"a nested \\\\\\\"nested
\\\\\\\\\\\\\\\"nested\\\\\\\\\\\\\\\" \\\\\\\" quoted string\\\"\"";;
"A quoted string with \"a nested \\\"nested
\\\\\\\"nested\\\\\\\" \\\" quoted string\""- : unit = ()

In general, the number of \s required for n levels of quoting is 2ⁿ - 1 (that is, an exponential function). The solution follows from the recurrence relation Q₀ = 0 and Q_n = 2Q_{n - 1} + 1 which in fact establishes a connection to the "Towers of Hanoi" problem.

How to render trees like the Unix tree command

2017-10-14T15:20:00.001-04:00

How to render trees like Unix 'tree'

The Unix tree utility produces a pretty rendering of a filesystem. Implementing an algorithm to produce output like tree is a little harder than one might expect! This short example program illustrates one way of doing it.

(* A type of non-empty trees of strings. *)
type tree = [
  |`Node of string * tree list
]
;;

(* [print_tree tree] prints a rendering of [tree]. *)
let rec print_tree
          ?(pad : (string * string)= ("", ""))
          (tree : tree) : unit =
  let pd, pc = pad in
  match tree with
  | `Node (tag, cs) ->
     Printf.printf "%s%s\n" pd tag;
     let n = List.length cs - 1 in
     List.iteri (
         fun i c ->
         let pad =
           (pc ^ (if i = n then "`-- " else "|-- "),
            pc ^ (if i = n then "    " else "|   ")) in
         print_tree ~pad c
       ) cs
;;

(* An example tree. *)
let tree =
  `Node ("."
        , [
            `Node ("S", [
                      `Node ("T", [
                                `Node ("U", [])]);
                      `Node ("V", [])])
          ;  `Node ("W", [])
          ])
;;

(* Print the example tree. *)
let () =  print_tree tree
;;

The output of the above looks like this:

.
|-- S
|   |-- T
|   |   `-- U
|   `-- V
`-- W

Transpose

2017-08-12T15:38:00.000-04:00

Transpose

If we are to represent a row of a matrix as a list of numbers, then a matrix can naturally be represented as a list of lists of numbers.

The transpose of a matrix $\mathbf{A}$ is a new matrix denoted $\mathbf{A^{T}}$. The traditional mathematical definition of $\mathbf{A^{T}}$ is expressed as saying the $i$ th row, $j$ th column element of $\mathbf{A^{T}}$ is the $j$ th row, $i$ th column element of $\mathbf{A}$:

$\left[\mathbf{A}\right]_{ij} = \left[\mathbf{A^{T}}\right]_{ji}$.

As definitions go, this isn't terribly helpful in explaining how to compute a transpose. A better equivalent definition for the functional programmer is : the matrix obtained by writing the columns of $\mathbf{A}$ as the rows of $\mathbf{A^{T}}$.

An elegant program for computing a transpose follows from a direct translation of that last definition.

let rec transpose (ls : 'a list list) : 'a list list =
  match ls with
  | [] | [] :: _ -> []
  | ls -> List.map (List.hd) ls :: transpose (List.map (List.tl) ls)

It is not at all hard to understand how the program works when you've seen an example:

transpose [[1; 2]; [3; 4;]; [5; 6]]
  = [1; 3; 5] :: transpose [[2]; [4;]; [6]]
  = [1; 3; 5] :: [2; 4; 6] :: transpose [[]; []; []]
  = [1; 3; 5] :: [2; 4; 6] :: []
  = [[1; 3; 5]; [2; 4; 6]]

Being as pretty as it is, one is inclined to leave things be but, as a practical matter it should be rephrased to be tail-recursive.

let rec transpose (ls : 'a list list) : 'a list list  =
  let rec transpose_rec acc = function
  | [] | [] :: _ -> List.rev acc
  | ls -> transpose_rec (List.map (List.hd) ls :: acc) (List.map (List.tl) ls)
  in transpose_rec [] ls

References:
"An Introduction to Functional Programming Systems Using Haskell" -- Davie A J T., 1992

Shayne Fletcher

Cabal package macros (MIN_VERSION_xyz)

Cabal package macros (MIN_VERSION_xyz)

Testing a new stack resolver

Testing a new stack resolver

Where stack looks for resolvers

Create a local stack setup file

Install GHC version x.y.z via the setup file

Annotations in GHC

Annotations in GHC

Annotations on syntactic elements

Annotations on source spans

Abstracting over locations

arith-cxx-tagless-final

arith-cxx-tagless-final

Parser

Evaluator

Interpreter

Configuring Cabal Build Flags

Configuring Cabal build flags

Two things in Rust

Two things in Rust

1 Pattern binding modes

Example

Example

2. Implict Deref Coercisons with Functions and Methods

ghc-lib-parser module count

Syntactic ambiguity resolution in the GHC parser

GHC: How whitespace sensitive operator lexing works

GHC Haskell Pats and LPats

Partitions of a set

Build GHC with stack and hadrian

Setup

Build

Harvesting annotations from the GHC parser

Have GHC parsing respect dynamic pragmas

Handling GHC parser errors right

Announcing ghc-lib 0.20190404

What's new

Bush fixing Travis and GitLab

Adding a GHC Language Extension

Implementing the extension

Testing for the extension

Lexer

Parser

Renaming, type-checking and de-sugaring

Bucket Sort

Dijkstra's algorithm

How to migrate your ppx to OCaml migrate parsetree

OCaml migrate parse tree

Towers of Hanoi

Nesting quoted strings in OCaml

How to render trees like the Unix tree command

Transpose

Cabal package macros (`MIN_VERSION_xyz`)

Install GHC version `x.y.z` via the setup file