Friday, October 30, 2020

ghc-lib-parser module count

ghc-lib-parser module count

When a user installs a program like HLint via Cabal, there's a good chance they'll pay a cost of building ghc-lib-parser. The build time for ghc-lib-parser is proportional to the number of modules that it contains, the less there are the faster it builds. The fewer there are, the better the user experience installing HLint.

Back in September last year there was a bit of a scare. At that time, ghc-lib-parser consisted of around 165 modules (and ghc-lib had roughly 300). An MR landed that unintentionally resulted in ghc-lib-parser needing 543 modules (and ghc-lib getting just 25). Fortunately a quick refactoring sorted that out (thanks @sgraf1337!) As @mpickering_ correctly pointed out at that time "any fix should ensure adding a test to make sure it doesn't happen again". To that end, @cocreature and I managed to work out the details of one which we can have a look at in a minute.

Since a year has now elapsed since the test was introduced, it's interesting to see how the module count has changed over time. TL;DR it's not too bad — the number of modules has increased from around 165 a year ago to about 230 today:

So how does the test work? To be comfortable in the knowledge that the number of modules needed in ghc-lib-parser will not signficantly change without anyone noticing, it's enough to have a program that counts them. That program when inserted into GHC's continuous integration system alerts the committer to a limit breach. How do we count them? We use the GHC API to compute the transitive closure of the dependencies of GHC.Parser — the count is the cardinality of that. The code is only a few lines. The key to it is the function hscGetModuleInterface. You can read the source here.