Friday, June 28, 2019

Harvesting annotations from the GHC parser

Harvesting annotations from the GHC parser

My last post on parsing in the presence of dynamic pragmas left us with this outline for calling the GHC parser.

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file flags s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m

Now, it's a fact that you'll not find in a GHC parse tree certain things like comments and the location of keywords (e.g. let, in and so on). Certainly, if you're writing refactoring tools (think programs like Neil Mitchell's awesome hlint for example), access to these things is critical!

So, how does one go about getting these program "annotations"? You guessed it... there's an API for that.

If we assume the existence of a function analyzeModule :: DynFlags -> Located (HsModule GhcPs) -> ApiAnns -> IO () then, here's the gist of the code that exercises it:

            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)
Here harvestAnns is defined as
    harvestAnns pst =
      ( Map.fromListWith (++) $ annotations pst
      , Map.fromList ((noSrcSpan, comment_q pst) : annotations_comments pst)
      )

The type ApiAnns is a pair of maps : the first map contains keyword and punctuation locations, the second maps locations of comments to their values.

You might think that's the end of this story but there's one twist left : the GHC lexer won't harvest comments by default - you have to tell it to do so by means of the Opt_KeepRawTokenStream (general) flag (see the GHC wiki for details)!

Taking the above into account, to parse with comments, the outline now becomes:

      flags <-
        parsePragmasIntoDynFlags
          (defaultDynFlags fakeSettings fakeLlvmConfig) file s
      whenJust flags $ \flags ->
         case parse file (flags `gopt_set` Opt_KeepRawTokenStream)s of
            PFailed s ->
              report flags $ snd (getMessages s flags)
            POk s m -> do
              let (wrns, errs) = getMessages s flags
              report flags wrns
              report flags errs
              when (null errs) $ analyzeModule flags m (harvestAnns s)

For a complete program demonstrating all of this see this example in the ghc-lib repo.