In GHC, Haskell operator occurrences get classified into one of four categories. For example, the occurrence of ⊕ in a ⊕ b
is "loose infix", in a⊕b
is "tight infix", in a ⊕b
is "prefix" and in a⊕ b
, "suffix"
The point of this is that certain operators can be ascribed different meanings depending on the classification of their occurrence and language extensions that may be in effect. For example, !
when encountered will lex as strictness annotation (token type ITbang
) if its occurrence is prefix (e.g. f !x = rhs
) or an ordinary operator (token type ITvarsym
) if not (e.g. xs ! 3
). Another ready example is provided by operator @
which, according to whitespace considerations, may be a type application (prefix), an as-pattern (tight infix), an ordinary operator (loose infix) or a parse error (suffix).
The implementation of this categorization relies upon two functions: followedByOpeningToken
and precededByClosingToken
. To explain further:
- Identifiers, literals and opening brackets
(
,(#
,[|
,[||
,[p|
,[t|
,{
are considered "opening tokens"; - Identifiers, literals and closing brackets
)
,#)
,]
,|]
,}
are considered "closing tokens"; - Other tokens and whitespace are considered neither opening or closing.
The classification algorithm is defined by the following rules:
precededByClosingToken | followedByOpeningToken | occurrence |
---|---|---|
False | True | prefix |
True | False | suffix |
True | True | tight infix |
False | False | loose infix |
precededByClosingToken
is very straightforward: look backwards one character in the lexing buffer.
precededByClosingToken :: AlexAccPred ExtsBitmap
precededByClosingToken _ (AI _ buf) _ _ =
case prevChar buf '\n' of
'}' -> decodePrevNChars 1 buf /= "-"
')' -> True
']' -> True
'\"' -> True
'\'' -> True
'_' -> True
c -> isAlphaNum c
Similarly, followedByOpeningToken
: look forwards one character in the lexing buffer.
followedByOpeningToken :: AlexAccPred ExtsBitmap
followedByOpeningToken _ _ _ (AI _ buf)
| atEnd buf = False
| otherwise =
case nextChar buf of
('{', buf') -> nextCharIsNot buf' (== '-')
('(', _) -> True
('[', _) -> True
('\"', _) -> True
('\'', _) -> True
('_', _) -> True
(c, _) -> isAlphaNum c
Armed by these rules, the lexing of operators looks like this:
<0> {
@varsym / { precededByClosingToken `alexAndPred` followedByOpeningToken } { varsym_tight_infix }
@varsym / { followedByOpeningToken } { varsym_prefix }
@varsym / { precededByClosingToken } { varsym_suffix }
@varsym { varsym_loose_infix }
}
The actions varsym_tight_infix
, varsym_prefix
, varsym_suffix
and varsym_loose_infix
are "fed" the operator and allow for language extension specific issuance of tokens (as opposed to issuance of general ITvarsym
tokens). For example, varsym_prefix
:
varsym_prefix :: Action
varsym_prefix = sym $ \exts s ->
if | TypeApplicationsBit `xtest` exts, s == fsLit "@"
-> return ITtypeApp
| ...
| otherwise -> return (ITvarsym s)