In GHC, Haskell operator occurrences get classified into one of four categories. For example, the occurrence of ⊕ in a ⊕ b is "loose infix", in a⊕b is "tight infix", in a ⊕b is "prefix" and in a⊕ b, "suffix"
The point of this is that certain operators can be ascribed different meanings depending on the classification of their occurrence and language extensions that may be in effect. For example, ! when encountered will lex as strictness annotation (token type ITbang) if its occurrence is prefix (e.g. f !x = rhs) or an ordinary operator (token type ITvarsym ) if not (e.g. xs ! 3). Another ready example is provided by operator @ which, according to whitespace considerations, may be a type application (prefix), an as-pattern (tight infix), an ordinary operator (loose infix) or a parse error (suffix).
The implementation of this categorization relies upon two functions: followedByOpeningToken and precededByClosingToken. To explain further:
- Identifiers, literals and opening brackets
(,(#,[|,[||,[p|,[t|,{are considered "opening tokens"; - Identifiers, literals and closing brackets
),#),],|],}are considered "closing tokens"; - Other tokens and whitespace are considered neither opening or closing.
The classification algorithm is defined by the following rules:
precededByClosingToken | followedByOpeningToken | occurrence |
|---|---|---|
False | True | prefix |
True | False | suffix |
True | True | tight infix |
False | False | loose infix |
precededByClosingToken is very straightforward: look backwards one character in the lexing buffer.
precededByClosingToken :: AlexAccPred ExtsBitmap
precededByClosingToken _ (AI _ buf) _ _ =
case prevChar buf '\n' of
'}' -> decodePrevNChars 1 buf /= "-"
')' -> True
']' -> True
'\"' -> True
'\'' -> True
'_' -> True
c -> isAlphaNum c
Similarly, followedByOpeningToken: look forwards one character in the lexing buffer.
followedByOpeningToken :: AlexAccPred ExtsBitmap
followedByOpeningToken _ _ _ (AI _ buf)
| atEnd buf = False
| otherwise =
case nextChar buf of
('{', buf') -> nextCharIsNot buf' (== '-')
('(', _) -> True
('[', _) -> True
('\"', _) -> True
('\'', _) -> True
('_', _) -> True
(c, _) -> isAlphaNum c
Armed by these rules, the lexing of operators looks like this:
<0> {
@varsym / { precededByClosingToken `alexAndPred` followedByOpeningToken } { varsym_tight_infix }
@varsym / { followedByOpeningToken } { varsym_prefix }
@varsym / { precededByClosingToken } { varsym_suffix }
@varsym { varsym_loose_infix }
}
The actions varsym_tight_infix, varsym_prefix, varsym_suffix and varsym_loose_infix are "fed" the operator and allow for language extension specific issuance of tokens (as opposed to issuance of general ITvarsym tokens). For example, varsym_prefix :
varsym_prefix :: Action
varsym_prefix = sym $ \exts s ->
if | TypeApplicationsBit `xtest` exts, s == fsLit "@"
-> return ITtypeApp
| ...
| otherwise -> return (ITvarsym s)