Saturday, June 29, 2019

Build GHC with stack and hadrian

Building GHC with stack and hadrian

By far the easiest way I know of to get a build of GHC is via the tools 'stack' and 'hadrian'*. The procedures below set out commands that I know first hand work** with machines provisioned by the CI systems Azure, Travis and Appveyor.

Setup

  • Ubuntu:
    curl -sSL https://get.haskellstack.org/ | sh
    stack setup
    
  • macOS:
    /usr/bin/ruby -e \
      "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    brew install autoconf automake gmp
    curl -sSL https://get.haskellstack.org/ | sh
    stack setup
    
  • Windows:
    curl -sSL https://get.haskellstack.org/ | sh
    stack setup
    stack exec -- pacman -S autoconf automake-wrapper make patch python tar \
          --noconfirm
    

Build

  • Ubuntu & macOS:
    git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
    cd ghc
    hadrian/build.stack.sh --configure --flavour=quickest -j
    
  • Windows:
    git clone --recursive https://gitlab.haskell.org/ghc/ghc.git
    cd ghc
    hadrian/build.stack.bat --configure --flavour=quickest -j
    


[*] The simplicitly and uniformity of these commands make me an advocate of these tools and in particular, the hadrian --configure flag.

[**] Well, that is to say mostly work. The above is the ideal and has worked me for me reliably for the last year. Recently though, for one reason or another, there seem to have been a lot of breakages. Your mileage may vary.

Friday, June 28, 2019

Harvesting annotations from the GHC parser

Harvesting annotations from the GHC parser

My last post on parsing in the presence of dynamic pragmas left us with this outline for calling the GHC parser.

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file flags s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m

Now, it's a fact that you'll not find in a GHC parse tree certain things like comments and the location of keywords (e.g. let, in and so on). Certainly, if you're writing refactoring tools (think programs like Neil Mitchell's awesome hlint for example), access to these things is critical!

So, how does one go about getting these program "annotations"? You guessed it... there's an API for that.

If we assume the existence of a function analyzeModule :: DynFlags -> Located (HsModule GhcPs) -> ApiAnns -> IO () then, here's the gist of the code that exercises it:

            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)
Here harvestAnns is defined as
    harvestAnns pst =
      ( Map.fromListWith (++) $ annotations pst
      , Map.fromList ((noSrcSpan, comment_q pst) : annotations_comments pst)
      )

The type ApiAnns is a pair of maps : the first map contains keyword and punctuation locations, the second maps locations of comments to their values.

You might think that's the end of this story but there's one twist left : the GHC lexer won't harvest comments by default - you have to tell it to do so by means of the Opt_KeepRawTokenStream (general) flag (see the GHC wiki for details)!

Taking the above into account, to parse with comments, the outline now becomes:

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file (flags `gopt_set` Opt_KeepRawTokenStream)s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)

For a complete program demonstrating all of this see this example in the ghc-lib repo.

Sunday, June 2, 2019

Have GHC parsing respect dynamic pragmas

Have GHC parsing respect dynamic pragmas

This post about Handling GHC parse errors shows that using qualified in postpostive position is a syntax error unless the ImportQualifiedPost language extension is enabled. In that post, it is explained that the program

module M where
import Data.List qualified
is invalid whereas,
{#- LANGUAGE ImportQualifiedPost -#}
module M where
import Data.List qualified
which enables the extension via a "dynamic pragma", is legit.

Perhaps surprisingly, running the second of these programs through the parsing code presented in that post continues to generate the error

     Found `qualified' in postpositive position.
     To allow this, enable language extension 'ImportQualifiedPost'
Evidently, our parse-fu needs an upgrade to respect dynamic pragmas and that's what this post provides.

This code exercises the GHC API to parse a module.

parse :: String -> DynFlags -> String -> ParseResult (Located (HsModule GhcPs))
parse filename flags str =
  unP Parser.parseModule parseState
  where
    location = mkRealSrcLoc (mkFastString filename) 1 1
    buffer = stringToStringBuffer str
    parseState = mkPState flags buffer location

Note in the above, the second argument flags :: DynFlags. In order for parse to take into account extensions enabled by pragmas in the source argument str, then flags must be set up to do so a priori. That is, before jumping into parse, a "first pass" must be made to sniff out flags. There is a GHC API for that. It's called parseDynamicFilePragma.

Here's a function to harvest flags from pragmas that makes that call to parseDynamicFilePragma.

parsePragmasIntoDynFlags :: DynFlags -> FilePath -> String -> IO (Maybe DynFlags)
parsePragmasIntoDynFlags flags filepath str =
  catchErrors $ do
    let opts = getOptions flags (stringToStringBuffer str) filepath
    (flags, _, _) <- parseDynamicFilePragma flags opts
    return $ Just flags
  where
    catchErrors :: IO (Maybe DynFlags) -> IO (Maybe DynFlags)
    catchErrors act = handleGhcException reportErr
                        (handleSourceError reportErr act)
    reportErr e = do putStrLn $ "error : " ++ show e; return Nothing
The main contribution of this function is to account for the complication that parseDynamicFilePragma can throw two kinds of exceptions : GhcException and SourceError. The GHC API functions handleGhcException and handleSourceError are the means to achieve that.

Putting it all together then, here's an outline of how to parse in the presence of dynamic pragmas.

      s <- readFile' file
      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file flags s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m
For a complete working program that utilizes this function, see this example in the ghc-lib repo.