First go with Parsec

June 19, 2007

Parsec is a library I’ve been interested in trying out seriously since I discovered it. I have written basic parsers for Java projects before and it has not been fun. The Parsec examples I’ve seen have captured my interest. So, now that I’ve decided I’d like to start a decent sized Haskell project I looked for something where Parsec would be involved.

This post describes the result of my first attempts at using the Parsec library for something semi serious. It’s not intended as a tutorial or introduction to Parsec, but to show how simple and straight forward it can be to use.

The Project

I decided to have a go at developing a SmallTalk interpreter. SmallTalk is a language that I have heard a lot about, but never really had a chance to invest the time in ‘getting’ the built in GUI approach, so I thought developing a program that can read SmallTalk from a text file and run it might be fun (similar to GST, I think). I will refer to this project as HST - Haskell SmallTalk.

Of course, I attempt this with no knowledge of compilers/interpreters since my university days (and a fair bit of that knowledge has disappeared).

First Steps

I decided to start with, what I guess is, best described as a Spike Solution. The famous ‘Hello World’ is a good candidate for this. The code is as follows


Transcript show: 'hello world'

Looking through the draft ANSI standard for SmallTalk provides me with the following EBNF for parsing the above:


<smalltalk program> ::= <program element>
<program element> ::= <program initializer definition>
<program initializer definition> ::= <initializer definition>
<initializer definition> ::= <statements>
<statements> ::= <expression>
<expression> ::= <basic expression>
<basic expression> ::= <primary> <messages>
<messages> ::= <keyword message>
<keyword message> ::= keyword <keyword argument>
<keyword argument> ::= <primary>
<primary> ::= identifier | <literal>
<literal> ::= <string literal>
<string literal> ::= quotedString

To get started, I’ll just concern myself with those parts from <basic expression> down.

Next, I’d like to look at what Haskell data structures I’d like to parse this into.


newtype Identifier = Identifier String
newtype Literal = StringLiteral String
data Primary = PrimaryLiteral Literal | PrimaryIdentifier Identifier
newtype Keyword = Keyword String
data Message = KeywordMessage Keyword Primary
data BasicExpression = BasicExpression Primary Message

I think this will work. So, the resulting data structure after parsing ‘Hello World’ should be:


BasicExpression
  (PrimaryIdentifier (Identifier "Transcript"))
  (KeywordMessage
    (Keyword "show")
    (PrimaryLiteral (StringLiteral "Hello World")))

Onto Parsec

Lets start at the bottom. First look at <string literal>. String literals in SmallTalk are surrounded by single quotes. The parsec function names are quite descriptive and allow me to pretty much translate the syntax descriptions.


stringLiteral = do
    (char '\\'')
    s <- manyTill anyChar (char '\\'')
    return $ StringLiteral s

Tesing in ghci


*HST.Parse> parseTest stringLiteral "'HelloWorld'"
StringLiteral "Hello World"

Seems to work.

Keywords and Identifiers are similar.


keyword = do
    k <- manyTill letter (char ':')
    return $ Keyword k

identifier = do
    i <- many1 letter
    return $ Identifier i

Next we move on to some more interesting stuff. <primary> gets a little tricky, as it requires us to attempt parsing either a string literal or an identifer. As both these parsers return different types it’s not possible to combine them directly using the <|> operator. The below works for me, but I’m not sure this is the best way to approach it.


primary = do {
        i <- identifier;
        return $ PrimaryIdentifier i; }
    <|>   do {
        l <- stringLiteral;
        return $ PrimaryLiteral l; }

<keyword message> is more straight forward, but it’s here that we start having to be careful with whitespace. The code that works for me is:


keywordMessage = do
    k <- keyword
    spaces
    p <- primary
    return $ KeywordMessage k p

and finally, we get to <basic expression>. Again, this is straight forward, just as long as we are careful with whitespace.


basicExpression = do
    p <- primary
    spaces
    m <- keywordMessage
    return $ BasicExpression p m

Finally, testing with ghci


*HST.Parse> parseTest basicExpression "Transcript show: 'Hello World'"
BasicExpression
  (PrimaryIdentifier (Identifier "Transcript"))
  (KeywordMessage (Keyword "show") (PrimaryLiteral (StringLiteral "Hello World")))

Bingo.

Closing

The result is not perfect or robust, and probably not even efficient, but this will come later. The parsing code for ‘Hello World’ has come together quickly and allows for further development of the spike.

4 Responses to “First go with Parsec”

  1. Refining my first steps with Parsec « lstephen Says:

    [...] 22nd, 2007 To start gaining experience with Parsec and Haskell I completed the first working version of a parser for SmallTalk’s version of ‘Hello World’. To follow up I decided it [...]

  2. Serge Stinckwich Says:

    This is Smalltalk, not SmallTalk ! Have fun.

  3. lstephen Says:

    Ah yes, so it is. Not sure where I got that capitalization from. I’ll make sure to get it right in the future :)

  4. Completing the Spike « lstephen Says:

    [...] on the Spike Solution for HST - a simple Smalltalk interpreter developed in Haskell. In my previous two posts, I developed a parser that was able to parse a Smalltalk version of ‘Hello [...]

Leave a Reply