Refining my first steps with Parsec

June 22, 2007

To start gaining experience with Parsec and Haskell I completed the first working version of a parser for SmallTalk’s version of ‘Hello World’. To follow up I decided it would be worthwhile getting advice on those parts that just didn’t feel right. Off to Haskell Cafe I went.

I had only heard good things about the Haskell community, and after my experience I have nothing to say to the contrary. My post and the follow up responses can be found here. Thanks to all those that responded.

The Questionable Code

I was mainly concerned with the double do block present in primary. For reference, here is the applicable code:

primary = do {
        i <- identifier;
        return $ PrimaryIdentifier i; }
    <|>   do {
        l <- stringLiteral;
        return $ PrimaryLiteral l; }

identifier = do
    i <- many1 letter
    return $ Identifier i
stringLiteral = do
    (char '\\'')
    s <- manyTill anyChar (char '\\'')
    return $ StringLiteral s

I had a feeling that Haskell was capable of producing more readable code than this.

Avoiding do

One disadvantage of do-notation, is that it can often obscure what is actually a simple monadic operation. It is said better over at Data.SyntaxFree:

Do-notation — and the fact that it’s introduced in the context of IO — obscures the fact that monads are implemented quite simply in Haskell itself as a type-class and correspond to a simple algebraic structure that models many useful computational phenomena.

Modifying the above code so it doesn’t require do notation results in the following (Thanks to Dave Tapley):


primary = (identifier >>= (return . PrimaryIdentifier))
      <|> (stringLiteral >>= (return . PrimaryLiteral))
identifier = (many1 letter) >>= (return . Identifier)
stringLiteral = (char '\\'') >> (manyTill anyChar (char '\\''))
            >>= (return . StringLiteral)

Although I wouldn’t consider this more readable than when using do notation, I believe this gives a better understanding of what’s going on, and ideas on simplifying it. In particular, some common constructs start to appear. The most notable of these is of the form:

parser >>= return . Constructor

Extracting the Common Construct

Thanks to the suggestions of many, there are Haskell functions that can simplify this common construct: liftM and fmap. Which one to choose is a matter of style. As is using them as infix functions. In this case I’ve decided to use an infix liftM.


primary = (PrimaryIdentifier `liftM` identifier)
      <|> (PrimaryLiteral `liftM` stringLiteral)
identifier = Identifier `liftM` (many1 letter)
stringLiteral = StringLiteral `liftM`
    ((char '\\'') >> (manyTill anyChar (char '\\'')))

One last improvment

The final section of code this code that could do with improvement is in stringLiteral. I have chosen to use a suggestion from Tillmann Rendel.
That suggestion is to define a new monadic combinator:

a >>~ b = a >>= \\x -> b >> return x

This combinator worse as for >>, but returns the result of the calculation on the left, rather than the right. This is best demonstrated in use:


stringLiteral = StringLiteral `liftM` (quote >> nonQuote >>~ quote)
quote = char '\\''
nonQuote = many $ noneOf ['\\'']

Summary

I had thought my first attempt at this parser resulted in a simple, readable parser. I still think it did. But now the expressive power of Parsec and Haskell have surpassed my expectations.

About these ads

6 Responses to “Refining my first steps with Parsec”


  1. The line:

    a >>~ b = a >>= x -> b >> return x

    needs a backslash before the x (for the lamdba).

  2. lstephen Says:

    Yes it does. Looks like wordpress made it disappear :)


  3. [...] A Smalltalk interpreter developed in Haskell (the code developed so far for the parser can be found here and here). To do this I’m going to make use of QuickCheck. I couldn’t find much [...]

  4. Nick Says:

    This post is a couple months old, but whatever. Text.ParserCombinators.Parsec has a ‘between’ function.

    quote >> nonQuote >>~ quote = between quote quote nonQuote

    And if you’re going to parenthesize the second argument to liftM, you had might as well use it prefix. The usual reason for using it infix is to omit them.

  5. lstephen Says:

    Thanks Nick, comments on any post, no matter how old are welcome.

    I seem to remember that between wasn’t working for me, but I have no idea why. I must have missed something at the time.

    Thanks for the style tip on liftM. I’m still refining my Haskell style.

  6. Michael Heinrich Says:

    Hello!

    I wanted to thank you for creating this project and would like to know if you are going to continue working on it.

    I am intereseted in all things Smalltalk.

    Thank you!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: