Refining my first steps with Parsec
June 22, 2007
To start gaining experience with Parsec and Haskell I completed the first working version of a parser for SmallTalk’s version of ‘Hello World’. To follow up I decided it would be worthwhile getting advice on those parts that just didn’t feel right. Off to Haskell Cafe I went.
I had only heard good things about the Haskell community, and after my experience I have nothing to say to the contrary. My post and the follow up responses can be found here. Thanks to all those that responded.
The Questionable Code
I was mainly concerned with the double do block present in primary. For reference, here is the applicable code:
primary = do {
i <- identifier;
return $ PrimaryIdentifier i; }
<|> do {
l <- stringLiteral;
return $ PrimaryLiteral l; }
identifier = do
i <- many1 letter
return $ Identifier i
stringLiteral = do
(char '\\'')
s <- manyTill anyChar (char '\\'')
return $ StringLiteral s
I had a feeling that Haskell was capable of producing more readable code than this.
Avoiding do
One disadvantage of do-notation, is that it can often obscure what is actually a simple monadic operation. It is said better over at Data.SyntaxFree:
Do-notation — and the fact that it’s introduced in the context of IO — obscures the fact that monads are implemented quite simply in Haskell itself as a type-class and correspond to a simple algebraic structure that models many useful computational phenomena.
Modifying the above code so it doesn’t require do notation results in the following (Thanks to Dave Tapley):
primary = (identifier >>= (return . PrimaryIdentifier))
<|> (stringLiteral >>= (return . PrimaryLiteral))
identifier = (many1 letter) >>= (return . Identifier)
stringLiteral = (char '\\'') >> (manyTill anyChar (char '\\''))
>>= (return . StringLiteral)
Although I wouldn’t consider this more readable than when using do notation, I believe this gives a better understanding of what’s going on, and ideas on simplifying it. In particular, some common constructs start to appear. The most notable of these is of the form:
parser >>= return . Constructor
Extracting the Common Construct
Thanks to the suggestions of many, there are Haskell functions that can simplify this common construct: liftM and fmap. Which one to choose is a matter of style. As is using them as infix functions. In this case I’ve decided to use an infix liftM.
primary = (PrimaryIdentifier `liftM` identifier)
<|> (PrimaryLiteral `liftM` stringLiteral)
identifier = Identifier `liftM` (many1 letter)
stringLiteral = StringLiteral `liftM`
((char '\\'') >> (manyTill anyChar (char '\\'')))
One last improvment
The final section of code this code that could do with improvement is in stringLiteral. I have chosen to use a suggestion from Tillmann Rendel.
That suggestion is to define a new monadic combinator:
a >>~ b = a >>= \\x -> b >> return x
This combinator worse as for >>, but returns the result of the calculation on the left, rather than the right. This is best demonstrated in use:
stringLiteral = StringLiteral `liftM` (quote >> nonQuote >>~ quote)
quote = char '\\''
nonQuote = many $ noneOf ['\\'']
Summary
I had thought my first attempt at this parser resulted in a simple, readable parser. I still think it did. But now the expressive power of Parsec and Haskell have surpassed my expectations.
July 16, 2007 at 8:37 pm
The line:
a >>~ b = a >>= x -> b >> return x
needs a backslash before the x (for the lamdba).
July 17, 2007 at 8:39 am
Yes it does. Looks like wordpress made it disappear
July 29, 2007 at 12:35 pm
[...] A Smalltalk interpreter developed in Haskell (the code developed so far for the parser can be found here and here). To do this I’m going to make use of QuickCheck. I couldn’t find much [...]
September 14, 2007 at 1:31 pm
This post is a couple months old, but whatever. Text.ParserCombinators.Parsec has a ‘between’ function.
quote >> nonQuote >>~ quote = between quote quote nonQuote
And if you’re going to parenthesize the second argument to liftM, you had might as well use it prefix. The usual reason for using it infix is to omit them.
September 18, 2007 at 2:30 pm
Thanks Nick, comments on any post, no matter how old are welcome.
I seem to remember that between wasn’t working for me, but I have no idea why. I must have missed something at the time.
Thanks for the style tip on liftM. I’m still refining my Haskell style.