How is error information combined when parsers are combined? For example, using <|> to combine parsers, I would expect the set of expected characters for an error to be the union of the sets of expected characters from the individual parsers. (I’m finding it hard to pin down the behaviour of <|> or even to find the relevant source code.)
I tried a simple example and it seems to work the way I expected.
The source code appears to be in Text/Megaparsec/Internal.hs, although I haven’t gotten my head around it yet.
To understand the behavior of megaparsec’s
<|>
operator it is useful to know about “consuming” and “non-consuming” (or “empty”) parses. To illustrate that concept I’ll compare a literal string parser to a parser that parses each character separately, watch:> let p = string "abc" > let q = sequence [char 'a', char 'b', char 'c'] > parseMaybe (p <|> string "abd") "abd") Just "abd" > parseMaybe (q <|> string "abd") "abd" Nothing
So, what happened? Well, when
string "abc"
tries to parse the string"abd"
it fails without consuming any input. Or you can think of it as backtracking back to the beginning of the string. In contrast, the parsersequence [char 'a', char 'b', char 'c']
does consume the'a'
and'b'
characters even if it fails. In this case,<|>
will not even try to use thestring "abd"
parser.You can manually force the parser to backtrack by using the
try
function as follows:> parseMaybe (try q <|> string "abd") "abd" Just "abd"
But note that this can cause exponential running time, so try to avoid it.
To answer your question given this information: the error information will be combined but only if both arguments of
<|>
are failing without consuming any input. If either consumes input, then only the error information from that branch is used.