Saturday, January 14, 2023

Cabal package macros (MIN_VERSION_xyz)

-

Cabal package macros (MIN_VERSION_xyz)

cabal build ... generates cabal_macros.h containing e.g. for v 3.5.0 a definition like,

/* package hlint-3.5 */
#ifndef VERSION_hlint
#define VERSION_hlint "3.5"
#endif /* VERSION_hlint */
#ifndef MIN_VERSION_hlint
#define MIN_VERSION_hlint(x,y,z) (\
  (x) <  3 || \
  (x) == 3 && (y) <  5 || \
  (x) == 3 && (y) == 5 && (z) <= 0)
#endif /* MIN_VERSION_hlint */

This macro is a compile time predicate. Use to test the hlint configured package version is at least x.y.z.

We might for example test if the configured package version is at least 3.6.0 by

MIN_VERSION_hlint(3.6.0)

By substitution into the macro body we arrive at

3 < 3 || 3 == 3 && 6 < 5 || 3 == 3 and 6 == 5 and 0 <= 0

to conclude, no it is not.

Sunday, August 7, 2022

Testing a new stack resolver

-

Testing a new stack resolver

When there’s been a new GHC release, it can take a little while for there to be a stack resolver for it. The following procedure can be used for some local testing if you want to get ahead of the game1.

Where stack looks for resolvers

If you execute the command,

    $stack setup --resolver ghc-x.y.z

you’ll see something like

    No setup information found for ghc-x.y.z on your platform.

if there isn’t yet a resolver for ghc-x.y.z.

The default set of resolvers stack “knows about” are those enumerated in the file stack-setup-2.yaml in the commericalhaskell/stackage-content repository.

Create a local stack setup file

  • Start by downloading the release binary package tarball of interest from www.haskell.org and note its
    • size (in bytes):
           stat -f%z ghc-x.y.z-x86_64-apple-darwin.tar.xz
    • SHA1 hash:
           shasum -a 1 ghc-x.y.z-x86_64-apple-darwin.tar.xz
    • SHA256 hash:
           shasum -a 256 ghc-x.y.z-x86_64-apple-darwin.tar.xz
  • Now, create stack-setup-2-ghc-x.y.z.yaml with contents along the lines of
        ghc:
          macosx:
            x.y.z:
              url: http://downloads.haskell.org/~ghc/x.y.z/ghc-x.y.z-x86_64-apple-darwin.tar.xz
              content-length: 177152992
              sha1: 2dbd726860ed2c0ea04c7aca29c22df20b952ee1
              sha256: f2e8366fd3754dd9388510792aba2d2abecb1c2f7f1e5555f6065c3c5e2ffec4

Update: Here’s an even easier way.

    ghc:
     macosx:
        x.y.z:
          url: /path/to/ghc-x.y.z-x86_64-apple-darwin.tar.xz

Install GHC version x.y.z via the setup file

The following stack setup command will use the setup file created above to download and install ghc-x.y.z:

  stack --setup-info-yaml stack-setup-2-ghc-x.y.z.yaml --resolver ghc-x.y.z setup

Once stack has got GHC installed, there’s no further any need to pass a setup-info-yaml argument to subsequent stack commands, it’s ready to go!


  1. Kindly explained to me by Mike Pilgrem in this ticket. This note is biased towards my needs on macOS. See the linked issue for further details especially for Windows installations.↩︎

Wednesday, May 19, 2021

Annotations in GHC

annotations

Annotations in GHC

Starting with ghc-9.2.1, parse trees contain “annotations” (these are, for example, comments and the locations of keywords). This represents a non-trivial upgrade of GHC parse trees. If you work with GHC ASTs in your project, there will be no avoiding getting to know about them. This note is a summary overview of annotations: the where and how of their representations.

In-tree annotations enable exact-printing of GHC ASTs. This feature and the reformulation of the GHC AST with in-tree annotations to support it was conceived of and implemented by Alan Zimmerman (@alan_zimm). The achievement is of truly epic scale.

Annotations on syntactic elements

An EpaLocation is a span giving the exact location of a keyword in parsed source.

data EpaLocation = EpaSpan RealSrcSpan | EpaDelta DeltaPos
data DeltaPos = ...

The parser only inserts EpaSpans.

A DotFieldOcc arises in expressions like (.e) (field-selector) or a.e (field-selection) when OverloadedRecordDot is enabled. A DotFieldOcc value in the parse phase is associated with an AnnFieldLabel in its extension field (annotations in ghc-9.2.1 lean heavily on the facilities afforded by TTG). The AnnFieldLabel contains the location of the ‘.’. AnnFieldLabel is an “annotation type”. You’ll recognize annotation types (there are many) by the convention that their names are prefixed Ann.

-- GHC.Hs.Expr
data AnnFieldLabel
  = AnnFieldLabel {
      afDot :: Maybe EpaLocation
      }
type instance XCDotFieldOcc (GhcPass _) = EpAnn AnnFieldLabel

-- Language.Haskell.Syntax.Expr
data DotFieldOcc p
  = DotFieldOcc
    { dfoExt   :: XCDotFieldOcc p
    , dfoLabel :: XRec p FieldLabelString
    }
  | XDotFieldOcc !(XXDotFieldOcc p)

(What XRec p FieldLabelString means will be explained in the next section.)

Note that the extension field dfoExt doesn’t contain a “raw” AnnFieldLabel, rather, it contains an EpAnn AnnFieldLabel.

EPAnn, envelopes an annotation. It associates a base location for the start of the syntactic element containing the annotation along with any comments enclosed in the source span of the element to which the EPAnn is attached. EpAnnUnsed is used when an annotation is required but there’s no annotation available to envelope (e.g one obvious case being in generated code).

data EpAnn ann
  = EpAnn { entry :: Anchor
          , anns :: ann
          , comments :: EpAnnComments }
  | EpAnnNotUsed

data EpAnnComments = ...

It’s the Anchor type where the base location is held.

data Anchor = Anchor { anchor :: RealSrcSpan, anchor_op :: AnchorOperation }

data AnchorOperator = ...

Annotations on source spans

Annotations don’t just get attached to syntactic elements, they frequently get attached to source spans too.

data SrcSpanAnn' a = SrcSpanAnn { ann :: a, locA :: SrcSpan }

Usually SrcSpanAnn' is used with EpAnn and that combination is named a SrcAnn.

data SrcAnn ann = SrcSpanAnn' (EpAnn ann)

There are many annotation types. The most ubiquitous are AnnListItem, NameAnn, AnnList, AnnPragma and AnnContext. Their use is common enough that names are given to their SrcAnn types (which you recall, wrap them in EpAnn and associate them with a SrcSpan).

type SrcSpanAnnA = SrcAnn AnnListItem
type SrcSpanAnnN = SrcAnn NameAnn

type SrcSpanAnnL = SrcAnn AnnList
type SrcSpanAnnP = SrcAnn AnnPragma
type SrcSpanAnnC = SrcAnn AnnContext

Of these, SrcSpanAnnA is used as a sort of “default” annotation.

What do you do with generalized SrcSpan types like these? You locate things with them.

type LocatedA = GenLocated SrcSpanAnnA
type LocatedN = GenLocated SrcSpanAnnN

type LocatedL = GenLocated SrcSpanAnnL
type LocatedP = GenLocated SrcSpanAnnP
type LocatedC = GenLocated SrcSpanAnnC

These type synonyms are only for the most commonly used annoation types. The general case is LocatedAn an.

type LocatedAn an = GenLocated (SrcAnn an)

To recap, a LocatedAn an is a GenLocated (SrcAnn an) which is a GenLocated (SrcSpanAnn' (EpAnn an)).

Abstracting over locations

Syntax definitions are generalized with respect to location information. That is, rather than hard-coding SrcSpan into syntax type definitions as we used to, type families are used in their place so that the structure of the syntax including locations can be described without fixing concrete types for the locations where you’d once have had a source span type.

It works like this. In Language.Haskell.Syntax.Extension there is this definition:

type family XRec p a = r | r -> a

Locations are specified in terms of XRecs. For example in Language.Haskell.Syntax.Expr we have this:

type LHsExpr p = XRec p (HsExpr p)

How XRec p (HsExpr p) is mapped onto a specific type in GHC is achieved in the following way. First in Language.Haskell.Syntax.Extension there is the following definition:

type family Anno a = b

Then, in GHC.Hs.Extension this definition:

type instance XRec (GhcPass p) a = GenLocated (Anno a) a

Specific choices for each syntatic element can then be made for GHC’s use of the parse tree and phase. For example, in GHC.Hs.Expr we have the following.

type instance Anno (HsExpr (GhcPass pass)) = SrcSpanAnnA

To see how this works, consider what that means for the located expression type LHsExpr GhcPs in GHC. We have LHsExpr GhcPs is XRec GhcPs (HsExpr GhcPs) which is GenLocated (Anno (HsExpr GhcPs)) (HsExpr GhcPs) or GenLocated SrcSpanAnnA (HsExpr GhcPs) (or, LocatedA (HsExpr GhcPs)) if you like).

Expanding further we have GenLocated SrcSpanAnnA (HsExpr GhcPs) is GenLocated (SrcAnn AnnListItem) (HsExpr GhcPs). So ultimately, LHsExpr GhcPs is GenLocated (SrcSpanAnn' (EpAnn AnnListItem)) (HsExpr GhcPs).

Friday, April 9, 2021

arith-cxx-tagless-final

arith-cxx-final-tagless

arith-cxx-tagless-final

The aim is to prove out the idea of an interpreter where the “front end” (lexical analysis) is in Rust and the “back end” (interpretation) is in C++.

  • We’ll parse and evaluate a simple language of arithmetic expressions;
    • The parser is implemented using nom (a Rust parser combinator library);
  • The “tagless-final” idiom is used to split the front and back ends;
  • The interop between C++ and Rust is expressed using the CXX library.

Parser

Additive expression syntax we’ll define by this grammar:

    expr := term ('+' term)*
    term := lit | '-' term | '(' expr ')'
    lit  := digits

The key idea of the program is this: Don’t define an abstract syntax tree type, values of which are produced by parsing. Rather, as parsing unfolds, call functions defined by the the folllowing trait.

pub trait ExprSyn: Clone {
    fn lit(n: i64) -> Self;
    fn neg(t: Self) -> Self;
    fn add(u: Self, v: Self) -> Self;
}

With that understood, we implement the parser as a Rust library in the following way.

pub mod parse {
    use nom::{
        branch::alt,
        bytes::complete::tag,
        character::complete::char,
        character::complete::{digit1 as digit, space0 as space},
        combinator::{map, map_res},
        multi::fold_many0,
        sequence::{delimited, pair, preceded},
        IResult,
    };

    use super::ExprSyn;
    use std::str::FromStr;

    type ParseResult<'a, E> = IResult<&'a str, E>;

    fn lit<E: ExprSyn>(i: &str) -> ParseResult<E> {
        map_res(delimited(space, digit, space), |x| {
            FromStr::from_str(x).map(E::lit)
        })(i)
    }

    fn neg<E: ExprSyn>(i: &str) -> ParseResult<E> {
        map(delimited(space, preceded(char('-'), term), space), E::neg)(i)
    }

    fn par<E: ExprSyn>(i: &str) -> ParseResult<E> {
        delimited(space, delimited(tag("("), expr, tag(")")), space)(i)
    }

    fn term<E: ExprSyn>(i: &str) -> ParseResult<E> {
        alt((lit, neg, par))(i)
    }

    pub fn expr<E: ExprSyn>(i: &str) -> ParseResult<E> {
        let (i, init) = term(i)?;
        fold_many0(pair(char('+'), term), init, |acc, (_, val): (char, E)| {
            E::add(acc, val)
        })(i)
    }
}

To wire that up to C++ we need express a CXX “bridge”.

use cxx::SharedPtr;

#[cxx::bridge]
pub mod ffi {
    unsafe extern "C++" {
        include!("arith_final_tagless/include/cpp_repr.hpp");
        type Cpp_repr;

        fn lit(i: i64) -> SharedPtr<Cpp_repr>;
        fn neg(e: SharedPtr<Cpp_repr>) -> SharedPtr<Cpp_repr>;
        fn add(l: SharedPtr<Cpp_repr>, r: SharedPtr<Cpp_repr>) -> SharedPtr<Cpp_repr>;
    }

    extern "Rust" {
        fn parse(s: String) -> Result<SharedPtr<Cpp_repr>>;
    }
}

The header file cpp_repr.hpp referenced by the bridge contains these prototypes.

#pragma once

#include <memory>
#include <cstdint>

struct Cpp_repr {
  int64_t expr;
};

using cpp_repr_t = std::shared_ptr<Cpp_repr>;

cpp_repr_t lit(int64_t i);
cpp_repr_t neg(cpp_repr_t t);
cpp_repr_t add(cpp_repr_t t, cpp_repr_t u);

The existence of that header is enough to finish off the Rust library.

#[allow(non_camel_case_types)]
pub type CppRepr_t = SharedPtr<ffi::Cpp_repr>;

impl ExprSyn for CppRepr_t {
    fn lit(i: i64) -> CppRepr_t {
        ffi::lit(i)
    }
    fn neg(t: CppRepr_t) -> CppRepr_t {
        ffi::neg(t)
    }
    fn add(t1: CppRepr_t, t2: CppRepr_t) -> CppRepr_t {
        ffi::add(t1, t2)
    }
}

pub fn parse(s: String) -> Result<CppRepr_t, String> {
    match parse::expr::<CppRepr_t>(s.as_str()) {
        Ok((_s, rep)) => Ok(rep),
        Err(e) => Err(format!("{}", e)),
    }
}

Evaluator

We put the C++ part of the implementation in cpp_repr.cpp in a separate library.

#include <iostream>

#include "arith_final_tagless/include/cpp_repr.hpp"

cpp_repr_t lit(int64_t i) {
  return std::shared_ptr<Cpp_repr>{new Cpp_repr{i}};
}

cpp_repr_t neg(cpp_repr_t t) {
  return std::shared_ptr<Cpp_repr>{new Cpp_repr{-t->expr}};
}

cpp_repr_t add(cpp_repr_t t, cpp_repr_t u) {
  return std::shared_ptr<Cpp_repr>{new Cpp_repr{t->expr + u->expr}};
}

Interpreter

The REPL in main.cpp brings the Rust and C++ libraries together into an executable.

#include <iostream>

#include "rust/cxx.h" // 'rust::Error'
#include "arith_final_tagless/src/arith_final_tagless.rs.h" // 'parse'

int main() {
  char const* prompt = "\n% ";
  std::cout << "Additive expression evalutator (type CTRL+D to exit)" << prompt;

  std::string line;
  while(std::getline(std::cin, line)) {
    try {
      if (auto p = parse(rust::String{line})) {
        std::cout << line << " = " << p->expr << prompt;
      }
    } catch (rust::Error const& e) {
      std::cerr << e.what() << prompt;
    }
  }

  return 0;
}

Source and build scripts and so on for this project here.

Sunday, February 7, 2021

Configuring Cabal Build Flags

*markdown-output*

Configuring Cabal build flags

It’s always an emergency when I go looking for this information!

Suppose you are building HLint.

mkdir ~/tmp && cd ~/tmp
curl -o hlint-3.2.7.tar.gz https://hackage.haskell.org/package/hlint-3.2.7/hlint-3.2.7.tar.gz
gunzip  hlint-3.2.7.tar.gz && tar xvf  hlint-3.2.7.tar

Left to their own devices, HLint and ghc-lib-parser-ex will default to auto mode meaning, they will decide for themselves if they should depend on ghc-lib-parser or native ghc libraries.

Sometimes it’s desirable to force the situation though and explicitly make them do one or the other. How you do that? The answer is of course package Cabal flags. There are two scenarios: building with stack or building with cabal.

  • stack.yaml

    • Force link with ghc-lib-parser

        flags:
          hlint:
            ghc-lib: true
          ghc-lib-parser-ex:
            auto: false
            no-ghc-lib: false
    • Force link with native ghc

        flags:
          hlint:
            ghc-lib: false
          ghc-lib-parser-ex:
            auto: false
            no-ghc-lib: true
  • cabal.project

    • Force link with ghc-lib-parser

        packages: hlint-3.2.7
        package hlint
          flags: +ghc-lib
        package ghc-lib-parser-ex
          flags: -auto -no-ghc-lib
    • Force link with native ghc

        packages: hlint-3.2.7
        package hlint
          flags: -ghc-lib
        package ghc-lib-parser-ex
          flags: -auto +no-ghc-lib

When working with Cabal in the HLint repository one can configure with command line constraints like this:

cabal new-build --constraint "hlint -ghc-lib" --constraint "ghc-lib-parser-ex -auto +no-ghc-lib"

Monday, January 18, 2021

Two things in Rust

two_things_rust.html

Two things in Rust

Two things I needed to learn before Rust made sense to me.

1 Pattern binding modes

I don’t remember reading about this in the book. Default binding modes come into play when non-reference patterns are encountered.

Example


let x = Some(3);
let y: &Option<i32> = &x;
match y {
    Some(a) -> {
    // `y` is deref'd and `a` is bound as `ref a`
    }
    None => {}
}

The default binding mode starts as move. Each time a reference is matched using a non-reference pattern; it will automatically derefence the vaue and update the default binding mode - If the reference is &val, set the default binding mode to ref - If the reference is &mut val: if the current default binding is ref, it should remain ref. Otherwise, set the current binding mode to ref mut.

Full details are given in the 2005-match-ergonomics rustlang RFC.

Example

  • Example
match (&Some(3)) {
    Some(p) =>
      // This pattern is a "non-reference pattern".
      // Dereference the `&` and shift the default binding
      // mode to `ref`. `p` is read as `ref p` and given type `i32`.
   x => {
     // In this arm, we are still in `move` mode by default, so `x`
     // has type `&Option<32>`
   }
}
  • Desugared
    match(&Some(3)) {
      &Some(ref p) => {
         ...
      },
      x => {
         ...
      },
    }

2. Implict Deref Coercisons with Functions and Methods

This is another ergonomics feature that saves on explicit &s and *s when writing function and method calls. When we pass a reference to a function or method call deref implicitly as needed to coerce to the parameter target type.

  • Example:
fn hello(name: &str) {
    println!("Hello, {}", name);
}

fn main() {
    let m = MyBox::new(String::from("Rust"));
    hello(&m);
}
  • Example:
pub struct Point {
    x: Vec<i32>,
    y: Vec<i32>,
}

impl Point {
    pub fn x(&self) -> &Vec<i32> {
        match self {
            &Point { ref x, .. } => x,
        }
    }
}

fn use_i32(_: &i32) -> () {}

fn use_vi32(_: &Vec<i32>) -> () {}

fn use_str(_: &str) -> () {}

fn use_strr(_: &&String) -> () {}

fn main() {
    let p: Point = Point {
        x: vec![],
        y: vec![],
    };

    let rp: &Point = &p;
    let rrp: &&Point = &rp;

    println!("p.x = {:?}", rrp.x());

    let _s: &str = "foo";
    let s: String = String::from(_s);
    let bs: Box<String> = Box::new(s.clone());
    let bsr: Box<&String> = Box::new(&s);
    let bi32: Box<Vec<i32>> = Box::new(vec![]);

    use_i32(&&&&1i32);

    use_str(bs.deref());
    use_str(&s);
    use_strr(bsr.deref());
    let r: &&String = &bsr;
    let r2: &String = r.deref();

    use_str(r2);
    use_str(&bsr);
    use_vi32(&bi32);

    let p = Point {
        x: vec![1, 2, 3],
        y: vec![4, 5, 6],
    };
    match &p {
        Point { x, y } => {
            use_vi32(x);
            use_vi32(y);
            println!("{:?}, {:?}", x, y);
        }
    }
    println!("{:?}, {:?}", p.x, p.y);
}

Friday, October 30, 2020

ghc-lib-parser module count

ghc-lib-parser module count

When a user installs a program like HLint via Cabal, there's a good chance they'll pay a cost of building ghc-lib-parser. The build time for ghc-lib-parser is proportional to the number of modules that it contains, the less there are the faster it builds. The fewer there are, the better the user experience installing HLint.

Back in September last year there was a bit of a scare. At that time, ghc-lib-parser consisted of around 165 modules (and ghc-lib had roughly 300). An MR landed that unintentionally resulted in ghc-lib-parser needing 543 modules (and ghc-lib getting just 25). Fortunately a quick refactoring sorted that out (thanks @sgraf1337!) As @mpickering_ correctly pointed out at that time "any fix should ensure adding a test to make sure it doesn't happen again". To that end, @cocreature and I managed to work out the details of one which we can have a look at in a minute.

Since a year has now elapsed since the test was introduced, it's interesting to see how the module count has changed over time. TL;DR it's not too bad — the number of modules has increased from around 165 a year ago to about 230 today:

So how does the test work? To be comfortable in the knowledge that the number of modules needed in ghc-lib-parser will not signficantly change without anyone noticing, it's enough to have a program that counts them. That program when inserted into GHC's continuous integration system alerts the committer to a limit breach. How do we count them? We use the GHC API to compute the transitive closure of the dependencies of GHC.Parser — the count is the cardinality of that. The code is only a few lines. The key to it is the function hscGetModuleInterface. You can read the source here.