Onboarding Parsec Library from Self-Defined Parser¶
Previously, I used my own parser to parse the solidity language syntax and finished the expression part. It's a combination of ExceptT
and State
to maintain the states and report error.
The original parser doesn't have error report at all, and it definitely is not the correct way. Moreover, I believe the open source parser library works better than mine as i'm still newbie to haskell. Hence, I choose to replace my custom parser by the parsec
parser.
This blog introduces the investigation of parsec and the problems I encountered during migration. Beside them, I will compare some pattern differences between parsec and mine.
Investigation¶
Define Our Parser Using Parsec¶
The parsec provides a transformer ParsecT for users to custom their own parsers.
ParsecT s u m a is a parser with stream type s, user state type u, underlying monad m and return type a. Parsec is strict in the user state.
A bit complex, especially the underlying monad
and state type
. Don't worry, let's use Parsec
first. It uses Identity
as the underlying monad, which doesn't make sense for the simple usage.
Hence, we can redefine the parser as this:
Define and Run Parser¶
Then, we can define some parsers to parse a word and a line.
type MyParser a = Parsec Text () a
pReadline :: MyParser Text
pReadline = pack <$> manyTill anyChar newline
pWord :: MyParser Text
pWord =pack <$> many1 (noneOf " \n")
Then, we can execute it via parse
, which is similar like evalState
:
main :: IO ()
main = do
print $ parse pReadline "" "hello world\n"
print $ parse pWord "" "hello world"
Moreover, if you would like to check the left stream, you can use runParser
function to emit them. The function getInput
is required to retrieve the state.
main :: IO ()
main = do
print $
runParser
( pReadline
>>= \result ->
getInput
>>= \rest -> return (result, rest)
)
()
""
"hello world\n left"
The getInput
works similar with state's get
method, so you can do the code shows before. It's almost of the same as the State
monad.
pWord :: MyParser Text
pWord = do
s <- getInput
trace (unpack s) $ pure ()
r <- pack <$> many1 (noneOf " \n")
s' <- getInput
trace (unpack s') $ pure ()
return r
Error Report and Alternative¶
Report error is also important in parser. We use throwError
from ExceptT to report error in previous parser, but now we use fail
provided by Parsec:
The output will contain the error message along with the position where the error raises.
To use alternative, try
function is used because we need to keep the original state if a parser fails, but usually many parsers stop parsing and to report an error.
For example, if we want to parse a whole line or a word, we can do the code below, it outputs Right "hello"
.
Try¶
The function try
restores the stream state if the parser failed. For example, when you try to consume string
via pOneKeyword
implemented as below from stream str1
, the parser will fail and leave the new state 1
.
This behavior causes the problem because we need to use another alternative parser to consume it again. To prevent this issue, you can use try $ pOneKeyword "string"
to parse the stream.
Pitfalls¶
Separator Parser Should Be Concise¶
Using sepBy
to parse the pattern a, b ,c ,d
is a good idea, but note that the separator should be as concise as possible. In our case, the separator should be a single comma, instead of a comma quoted with several possible spaces.
- do:
char ','
- don't:
pManySpaces *> char ',' <* pManySpaces
The reason why we shouldn't use the latter part is that once the separator parser consumes stream, the sepBy
parser believes it encounters a separator and should finish to parse. However, it's not because sometimes the leading characters are misleading. For example, when you parse a, new b(), c
, the sepBy
find the space after a, new
and treat the characters below should have a separator.
ManyTill + LookAhead¶
When we parse decorators for the function, it should stop parsing when encountering the returns
keyword or {
without consuming it. To do so, we need to use manyTill
with the lookAhead
feature, where lookAhead returns the result of parser without consuming stream:
pFunctionDecorators :: Parser [FnDecorator]
pFunctionDecorators = do
pManySpaces
*> manyTill
( ( FnDecV <$> try pFnDeclVisibility
<|> pFnDeclVirtual
<|> (FnDecS <$> try pFnDeclStateMutability)
<|> (FnDecOs <$> try pOverrideSpecifier)
-- modifier invocation should be put at last,
-- otherwise it will process the 'override' and 'virtual' as a modifier invocation,
-- which is definitely wrong
<|> (FnDecMI <$> try pFnDeclModifierInvocation)
)
<* pMany1Spaces
)
( lookAhead $
try (pOneKeyword "returns")
<|> try (pOneKeyword "{")
<|> eof $> ""
)