Sunday, March 1, 2020

GHC Haskell Pats and LPats

GHC Haskell Pats and LPats

In the Trees that Grow paper, it is explained that GHC has a single data type HsSyn that crosses several compiler phases; a second data type TH.Syntax for Template Haskell and that other Haskell libraries e.g. haskell-src-exts defnining yet others. Ideally, HsSyn would be reused in Template Haskell and these third-party libraries and motivates the flexibilities offered by the TTG (Trees That Grow) techniques.

Before GHC 8.8, patterns and located patterns were related in the following way:

type LPat = Located Pat
data Pat p
  = ...
  | LazyPat (XLazyPat p) (LPat p)
  ...
That is, patterns with locations are represented by values of type LPat and patterns themselves as values of type Pat. Note that LPat values contain Pat values which in turn can contain LPat values hence the name "ping pong style" being given to this idiom.

Since location annotations may (e.g. GHC native) or may not (e.g. Template Haskell) be present for a given application it is realized that "baking" locations into HsSyn is undesirable. For this reason, in 8.8 attempts were made to make their presence a strictly GHC "thing" in the following way:

type LPat p = Pat p
data Pat p
  = ...
  | LazyPat (XLazyPat p) (LPat p)
  | ...
  | XPat (XXPat p)
type instance XXPat (GhcPass p) = Located (Pat (GhcPass p))
That is, in GHC under this approach, locations are stored in the extension constructor - patterns with locations are wrapped in XPat e.g. XPat noExt (L _ (VarPat noExt _)). Of course, now, to get at the location you have to go through an indirection through XPat. For this, the functions cL and dL (and the bi-directional pattern synonym LL) were provided. Applications that don't want locations in the parse tree just don't make use of the XPat constructor.

It turned out that the 8.8 approach wasn't as good an idea as it seemed; it was a bit more complicated than it needed to be and had some unexpected implications for the existing GHC source code base. It was realized that this following alternative approach yields the same benefits and is what we find in 8.10 and beyond:

type family XRec p (f :: * -> *) = r | r -> p f
type instance XRec (GhcPass p) f = Located (f (GhcPass p))

type LPat p = XRec p Pat
data Pat p
  = ...
  | LazyPat (XLazyPat p) (LPat p)
  | ...
  | XPat (XXPat p)
type instance XXPat   (GhcPass _) = NoExtCon
Thus for GHC, ping-pong style is restored and applications other than GHC can define the XRec instance as simply f p so that locations are absent.

In practical terms, going from 8.8 to 8.10 LL becomes L, dL -> is removed and cL is just L.