Sunday, August 11, 2019

Partitions of a set

Calculating the partitions of a set

Having "solved" a bunch of these divide & conquer problems, I'm the first to admit to having being lulled into a false sense of security. At first glance, the problem of this post seemed deceptively simple and consequently I struggled with it, sort of "hand-waving", not really engaging my brain and getting more and more frustrated how the dang thing wouldn't yield to my experience! I think the moral of the story is math doesn't care about your previous successes and so don't let your past practice trick you into laziness. Be guided by your experience but fully apply yourself to the problem at hand!

Suppose a set of two elements {2, 3}. There are only two ways it can be partitioned: (23), (3)(2). For meaning, you might think of these two partitions like this : in the first partition, there is a connection between the elements 2 and 3, in the second, 2 and 3 are isolated from each other.

Suppose a set of elements {1, 2, 3}. There are five partitions of this set : (123), (23)(1), (13)(2), (3)(21), (3)(2)(1) (I've carefully written them out this way to help with the elucidation). Maybe you want to break here and see if you can write an algorithm for calculating them before reading on?

Observe that we can get the partitions of {1, 2, 3} from knowledge of the partitions of {2, 3} by looking at each partition of {2, 3} in turn and considering the partitions that would result by inclusion of the element 1. So, for example, the partition (23) gives rise to the partitions (123) and (23)(1). Similarly, the partition (3)(2) gives rise to the partitions (13)(2), (3)(21) and (3)(2)(1). We might characterize this process as computing new partitions of {1, 2, 3} from a partition p of {2, 3} as "extending" p .

Suppose then we write a function extend x p to capture the above idea. Let's start with the signature of extend. What would it be? Taking (23)(1) as an exemplar, we see that a component of a partition can be represented as [a] and so a partition itself then as [[a]]. We know that extend takes an element and a partition and returns a list of (new) partitions so it must have signature extend :: a -> [[a]] -> [[[a]]] (yes, lists of lists of lists are somehow easy to get confused about).

Now for writing the body of extend. The base case is the easiest of course - extending the empty partition:

extend x [] = [[[x]]]
  
That is, a singleton list of partitions where that one partition has one component. The inductive case is the partition obtained by "pushing" x into the first component of p together with the extensions that leave the first component of p alone.
extend x (h : tl) = ((x : h) : tl) : map (h :) (extend x tl)

We can now phrase the function partition with signature partition :: [a] -> [[[a]]] like this:

partition [] = [[]]
partition (h : tl) = concatMap (extend h) (partition tl)
The base case says, the only partition of the empty set is the the empty partition.

Wrapping it all up, the algorithm in entirety is

partition :: [a] -> [[[a]]]
partition [] = [[]]
partition (h : tl) = concatMap (extend h) (partition tl)
  where
    extend :: a -> [[a]] -> [[[a]]]
    extend x [] = [[[x]]]
    extend x (h : tl) = ((x : h) : tl) : map (h :) (extend x tl)

Saturday, June 29, 2019

Build GHC with stack and hadrian

Building GHC with stack and hadrian

By far the easiest way I know of to get a build of GHC is via the tools 'stack' and 'hadrian'*. The procedures below set out commands that I know first hand work** with machines provisioned by the CI systems Azure, Travis and Appveyor.

Setup

  • Ubuntu:
    curl -sSL https://get.haskellstack.org/ | sh
    stack setup
    
  • macOS:
    /usr/bin/ruby -e \
      "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    brew install autoconf automake gmp
    curl -sSL https://get.haskellstack.org/ | sh
    stack setup
    
  • Windows:
    curl -sSL https://get.haskellstack.org/ | sh
    stack setup
    stack exec -- pacman -S autoconf automake-wrapper make patch python tar \
          --noconfirm
    

Build

  • Ubuntu & macOS:
    git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
    cd ghc
    hadrian/build.stack.sh --configure --flavour=quickest -j
    
  • Windows:
    git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
    cd ghc
    hadrian/build.stack.bat --configure --flavour=quickest -j
    


[*] The simplicitly and uniformity of these commands make me an advocate of these tools and in particular, the hadrian --configure flag.

[**] Well, that is to say mostly work. The above is the ideal and has worked me for me reliably for the last year. Recently though, for one reason or another, there seem to have been a lot of breakages. Your mileage may vary.

Friday, June 28, 2019

Harvesting annotations from the GHC parser

Harvesting annotations from the GHC parser

My last post on parsing in the presence of dynamic pragmas left us with this outline for calling the GHC parser.

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file flags s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m

Now, it's a fact that you'll not find in a GHC parse tree certain things like comments and the location of keywords (e.g. let, in and so on). Certainly, if you're writing refactoring tools (think programs like Neil Mitchell's awesome hlint for example), access to these things is critical!

So, how does one go about getting these program "annotations"? You guessed it... there's an API for that.

If we assume the existence of a function analyzeModule :: DynFlags -> Located (HsModule GhcPs) -> ApiAnns -> IO () then, here's the gist of the code that exercises it:

            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)
Here harvestAnns is defined as
    harvestAnns pst =
      ( Map.fromListWith (++) $ annotations pst
      , Map.fromList ((noSrcSpan, comment_q pst) : annotations_comments pst)
      )

The type ApiAnns is a pair of maps : the first map contains keyword and punctuation locations, the second maps locations of comments to their values.

You might think that's the end of this story but there's one twist left : the GHC lexer won't harvest comments by default - you have to tell it to do so by means of the Opt_KeepRawTokenStream (general) flag (see the GHC wiki for details)!

Taking the above into account, to parse with comments, the outline now becomes:

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file (flags `gopt_set` Opt_KeepRawTokenStream)s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)

For a complete program demonstrating all of this see this example in the ghc-lib repo.

Sunday, June 2, 2019

Have GHC parsing respect dynamic pragmas

Have GHC parsing respect dynamic pragmas

This post about Handling GHC parse errors shows that using qualified in postpostive position is a syntax error unless the ImportQualifiedPost language extension is enabled. In that post, it is explained that the program

module M where
import Data.List qualified
is invalid whereas,
{#- LANGUAGE ImportQualifiedPost -#}
module M where
import Data.List qualified
which enables the extension via a "dynamic pragma", is legit.

Perhaps surprisingly, running the second of these programs through the parsing code presented in that post continues to generate the error

     Found `qualified' in postpositive position.
     To allow this, enable language extension 'ImportQualifiedPost'
Evidently, our parse-fu needs an upgrade to respect dynamic pragmas and that's what this post provides.

This code exercises the GHC API to parse a module.

parse :: String -> DynFlags -> String -> ParseResult (Located (HsModule GhcPs))
parse filename flags str =
  unP Parser.parseModule parseState
  where
    location = mkRealSrcLoc (mkFastString filename) 1 1
    buffer = stringToStringBuffer str
    parseState = mkPState flags buffer location

Note in the above, the second argument flags :: DynFlags. In order for parse to take into account extensions enabled by pragmas in the source argument str, then flags must be set up to do so a priori. That is, before jumping into parse, a "first pass" must be made to sniff out flags. There is a GHC API for that. It's called parseDynamicFilePragma.

Here's a function to harvest flags from pragmas that makes that call to parseDynamicFilePragma.

parsePragmasIntoDynFlags :: DynFlags -> FilePath -> String -> IO (Maybe DynFlags)
parsePragmasIntoDynFlags flags filepath str =
  catchErrors $ do
    let opts = getOptions flags (stringToStringBuffer str) filepath
    (flags, _, _) <- parseDynamicFilePragma flags opts
    return $ Just flags
  where
    catchErrors :: IO (Maybe DynFlags) -> IO (Maybe DynFlags)
    catchErrors act = handleGhcException reportErr
                        (handleSourceError reportErr act)
    reportErr e = do putStrLn $ "error : " ++ show e; return Nothing
The main contribution of this function is to account for the complication that parseDynamicFilePragma can throw two kinds of exceptions : GhcException and SourceError. The GHC API functions handleGhcException and handleSourceError are the means to achieve that.

Putting it all together then, here's an outline of how to parse in the presence of dynamic pragmas.

      s <- readFile' file
      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file flags s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m
For a complete working program that utilizes this function, see this example in the ghc-lib repo.

Friday, May 10, 2019

Handling GHC parser errors right

Handling GHC parser errors right

Did you know, a POk parse result from the GHC parser doesn't necessarily mean the parse was OK? This blog explains what's up with that. The source code below is from this example in the ghc-lib repo.

Here is code that tries to make a parse tree of a Haskell module.

parse :: String -> DynFlags -> String -> ParseResult (Located (HsModule GhcPs))
parse filename flags str =
  unP Parser.parseModule parseState
  where
    location = mkRealSrcLoc (mkFastString filename) 1 1
    buffer = stringToStringBuffer str
    parseState = mkPState flags buffer location

The way to call the above code is like this.

case parse file flags s of
  PFailed s ->
    report flags $ snd (getMessages s flags)
  POk s m -> do
    report flags $ fst (getMessages s flags)
    analyzeModule flags m
In the PFailed s case (where s is the parse state), the expression snd (getMessages s flags) retrieves the errors and we report them. In the POk case, we report warnings and do whatever it is we wanted to do with the parse tree m right?

Not quite. The problem is that the parser produces two sorts of errors : "fatal" and "non-fatal". Thus far, we have only considered the "fatal" ones.

Fatal errors are such that production of a parse tree is impossible. Non-fatal parse errors are those that don't prevent construction of a parse tree. A parse that generates non-fatal errors is going to associate with a parse tree in some way non-conforming to the Haskell language specification.

The right way to write the POk case is like this.

POk s m -> do
  let (warns, errs) = getMessages s flags
  report flags warns
  report flags errs
  when (null errs) $ analyzeModule flags m
The key point is analyzeModule is called only if there are absolutely no parse errors at all.

A non-fatal error example is provided by the ImportQualifiedPost language extension (see this post for how to add a GHC language extension). Specifically, it is only legal to write import M qualified if the extension is in effect via pragma or the option -XImportQualifiedPost. In the event this syntax is used when the extension is not in effect, the user should see an error like

 test/MiniHlintTest_non_fatal_error.hs:6:18: error:
     Found `qualified' in postpositive position.
     To allow this, enable language extension 'ImportQualifiedPost'
and further analysis of the parse abandoned.

Sunday, April 7, 2019

Announcing ghc-lib 0.20190404

Announcing ghc-lib 0.20190404

On behalf of Digital Asset I am excited to share with you the latest release of ghc-lib.

As described in detail in the ghc-lib README, the ghc-lib project lets you use the GHC API for parsing and analyzing Haskell code without tying you to a specific GHC version.

What's new

The GHC source code in this release is updated to GHC HEAD as of April the 4th, 2019. Accordingly, the mini-hlint example in the ghc-lib repo was adjusted to accomodate GHC API changes to the ParseResult datatype and parser error handling.

By far the biggest change though is this : the ghc-lib project now provides two packages, ghc-lib-parser and ghc-lib. The packages are released on Hackage, and can be installed as usual e.g. cabal install ghc-lib.

Some projects don't require the ability to compile Haskell to GHC's Core language. If lexical analysis alone is sufficient for your project's needs, then the ghc-lib-parser package alone will do for that. The build time for ghc-lib-parser is roughly half of the combined build times of ghc-lib-parser and ghc-lib. That is, in this case, switching to the new release will decrease the build time for your project. Note that if your project does require compiling Haskell to Core, then your project will now depend on both the ghc-lib-parser and ghc-lib packages.

The ghc-lib package "re-exports" the modules of the ghc-lib-parser package. So, if you depend upon the ghc-lib package, you'll get the ghc-lib-parser modules "for free". Sadly though, at this time, package import syntax (and we do recommend using package import syntax for these packages) doesn't quite work like you'd expect so that if you, import "ghc-lib" DynFlags for example, this will fail because DynFlags is in fact in the ghc-lib-parser package. In this case, you'd write, import "ghc-lib-parser" DynFlags and all would be well. The mini-compile example in the ghc-lib repo demonstrates mixing modules from both packages.

Digital Asset make extensive use of the ghc-lib packages in the DAML smart contract language compiler and hope you continue to benefit from this project too!

Wednesday, March 20, 2019

Bush fixing Travis and GitLab

Bush fixing Travis and CI

Ever had one of those days?

You are not alone!

This Saturday 9th March 2019, the GHC devs are going to announce that git://git.haskell.org/ghc.git has been decommissioned. The new official upstream GHC will be https://gitlab.haskell.org/ghc/ghc.

Sadly (for us) this broke ghc-lib CI's Travis linux configuration.

What does our CI do? The ghc-lib CI script pulls down the latest GHC sources and builds and tests them as a ghc-lib. The details of the problem are that Travis gives you a broken Ubuntu where cloning the official URL fails with a TLS “handshake error”. More generally, any Travis job that tries to git clone over the https protocol from a GitLab remote will fail the same way.

This .travis.yml shows a workaround. The idea is to spin up a container before install that doesn’t have this problem and clone from there. The essential bits are:

services:
- docker

# [Why we git clone on linux here]
# At this time, `git clone https://gitlab.haskell.org/ghc/ghc.git`
# from within `CI.hs` does not work on on linux. This appears to be a
# known Travis/ubuntu SSL verification issue. We've tried many less
# drastic workarounds. This grand hack is the only way we've found so
# far that can be made to work.
before_install:
- |
    if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
      docker pull alpine/git
      docker run -ti --rm -v ${HOME}:/root -v $(pwd):/git \
        alpine/git clone https://gitlab.haskell.org/ghc/ghc.git /git/ghc --recursive
    fi

Note, MacOS docker services aren’t supported but that’s OK! The TLS handshake problem doesn’t exhibit in that configuration.

Update : It turns out that while this issue exists in Ubuntu 14.04 which Travis uses by default, it is “fixed” in Ubuntu 16.04. So by writing dist: xenial in your .travis.yml file, the above workaround can be avoided.

Saturday, February 23, 2019

Adding a GHC Language Extension

Adding a GHC Language Extension

This note summarizes the essential mechanics of adding a new language extension to GHC. The example code will illustrate adding a Foo extension.

Implementing the extension

The first step is to add a Foo constructor to the Extension type in libraries/ghc-boot-th/GHC/LanguageExtensions/Type.hs.

data Extension
    = Cpp
    | OverlappingInstances
    ...
    | Foo

The next job is to extend xFlagsDeps in compiler/main/DynFlags.hs.

xFlagsDeps = [
  flagSpec "AllowAmbiguousTypes" LangExt.AllowAmbiguousTypes,
  ...
  flagSpec "Foo"                 LangExt.Foo
]

That's all it takes. With these two changes, it is now possible to enable Foo in Haskell source files by writing {-# LANGUAGE Foo #-} or from a compile command by passing the argument -XFoo.

Testing for the extension

Lexer

In compiler/parser/Lexer.x, locate data ExtBits and add a constructor for Foo.

data ExtBits
  = FfiBit
  | ...
  | FooBit
Next, extend the where clause of function mkParserFlags' with a case for Foo.
langExtBits =
        FfiBit `xoptBit` LangExt.ForeignFunctionInterface
    .|. InterruptibleFfiBit `xoptBit` LangExt.InterruptibleFFI

    ...

    .|. FooBit `xoptBit` LangExt.FooBit
  
The function xtest is then the basic building block for testing if Foo is enabled. For example, this specific function tests a bitmap for the on/off status of the Foo bit.
fooEnabled :: ExtsBitMap -> Bool
fooEnabled = xtest FooBit
In practice, testing for a language extension in the lexer is called from a function computing a lexer action. Suppose foo to be such a function and the action it computes depends somehow on whether the Foo language extension is in effect. Putting it all together, schematically it will have the following form.
foo :: (FastString -> Token) -> Action
foo con span buf len = do
    exts <- getExts
    if FooBit `xtest` exts then
       ...
    else
       ...

Parser

This utility computes a monadic expression testing for the on/off state of a bit in a parser state monad.

extension :: (ExtsBitmap -> Bool) -> P Bool
extension p = P $ \s -> POk s (p $! (pExtsBitmap . options) s)
An expression of this kind can be evaluated in the semantic action of a parse rule in compiler/parser/Parser.y. Here's an example of how one might be used.
foo :: { () }
  : 'foo'  {}
  | {- empty -}    {% do
                         foo_required <- extension fooEnabled
                         when foo_required $ do
                           loc <- fileSrcSpan
                           parseErrorSDoc loc $ text "Missing foo"
                    }

Renaming, type-checking and de-sugaring

All of renaming, typechecking and desurgaring occur in the contexts of TcRnIf _ _ monads. Function xoptM :: Extension -> TcRnIf gbl lcl Bool is provided for extension testing in such contexts. Here's a schematic of how such a test might be used in a renaming function.

import GHC.LanguageExtensions

updateFoos :: [AvailInfo] -> RnM (TcGlbEnv, TcLclEnv)
updateFoos info = do
  (globals, locals) <- getEnvs
  opt_Foo <- xoptM Foo
  if not opt_Foo then
    return (globals, locals)
  else
    let globals' = ...
        locals' = ...
    return (globals', locals')