First go with Parsec
June 19, 2007
Parsec is a library I’ve been interested in trying out seriously since I discovered it. I have written basic parsers for Java projects before and it has not been fun. The Parsec examples I’ve seen have captured my interest. So, now that I’ve decided I’d like to start a decent sized Haskell project I looked for something where Parsec would be involved.
This post describes the result of my first attempts at using the Parsec library for something semi serious. It’s not intended as a tutorial or introduction to Parsec, but to show how simple and straight forward it can be to use.
The Project
I decided to have a go at developing a SmallTalk interpreter. SmallTalk is a language that I have heard a lot about, but never really had a chance to invest the time in ‘getting’ the built in GUI approach, so I thought developing a program that can read SmallTalk from a text file and run it might be fun (similar to GST, I think). I will refer to this project as HST - Haskell SmallTalk.
Of course, I attempt this with no knowledge of compilers/interpreters since my university days (and a fair bit of that knowledge has disappeared).
First Steps
I decided to start with, what I guess is, best described as a Spike Solution. The famous ‘Hello World’ is a good candidate for this. The code is as follows
Transcript show: 'hello world'
Looking through the draft ANSI standard for SmallTalk provides me with the following EBNF for parsing the above:
<smalltalk program> ::= <program element>
<program element> ::= <program initializer definition>
<program initializer definition> ::= <initializer definition>
<initializer definition> ::= <statements>
<statements> ::= <expression>
<expression> ::= <basic expression>
<basic expression> ::= <primary> <messages>
<messages> ::= <keyword message>
<keyword message> ::= keyword <keyword argument>
<keyword argument> ::= <primary>
<primary> ::= identifier | <literal>
<literal> ::= <string literal>
<string literal> ::= quotedString
To get started, I’ll just concern myself with those parts from <basic expression> down.
Next, I’d like to look at what Haskell data structures I’d like to parse this into.
newtype Identifier = Identifier String
newtype Literal = StringLiteral String
data Primary = PrimaryLiteral Literal | PrimaryIdentifier Identifier
newtype Keyword = Keyword String
data Message = KeywordMessage Keyword Primary
data BasicExpression = BasicExpression Primary Message
I think this will work. So, the resulting data structure after parsing ‘Hello World’ should be:
BasicExpression
(PrimaryIdentifier (Identifier "Transcript"))
(KeywordMessage
(Keyword "show")
(PrimaryLiteral (StringLiteral "Hello World")))
Onto Parsec
Lets start at the bottom. First look at <string literal>. String literals in SmallTalk are surrounded by single quotes. The parsec function names are quite descriptive and allow me to pretty much translate the syntax descriptions.
stringLiteral = do
(char '\\'')
s <- manyTill anyChar (char '\\'')
return $ StringLiteral s
Tesing in ghci
*HST.Parse> parseTest stringLiteral "'HelloWorld'"
StringLiteral "Hello World"
Seems to work.
Keywords and Identifiers are similar.
keyword = do
k <- manyTill letter (char ':')
return $ Keyword k
identifier = do
i <- many1 letter
return $ Identifier i
Next we move on to some more interesting stuff. <primary> gets a little tricky, as it requires us to attempt parsing either a string literal or an identifer. As both these parsers return different types it’s not possible to combine them directly using the <|> operator. The below works for me, but I’m not sure this is the best way to approach it.
primary = do {
i <- identifier;
return $ PrimaryIdentifier i; }
<|> do {
l <- stringLiteral;
return $ PrimaryLiteral l; }
<keyword message> is more straight forward, but it’s here that we start having to be careful with whitespace. The code that works for me is:
keywordMessage = do
k <- keyword
spaces
p <- primary
return $ KeywordMessage k p
and finally, we get to <basic expression>. Again, this is straight forward, just as long as we are careful with whitespace.
basicExpression = do
p <- primary
spaces
m <- keywordMessage
return $ BasicExpression p m
Finally, testing with ghci
*HST.Parse> parseTest basicExpression "Transcript show: 'Hello World'"
BasicExpression
(PrimaryIdentifier (Identifier "Transcript"))
(KeywordMessage (Keyword "show") (PrimaryLiteral (StringLiteral "Hello World")))
Bingo.
Closing
The result is not perfect or robust, and probably not even efficient, but this will come later. The parsing code for ‘Hello World’ has come together quickly and allows for further development of the spike.
June 22, 2007 at 10:51 pm
[...] 22nd, 2007 To start gaining experience with Parsec and Haskell I completed the first working version of a parser for SmallTalk’s version of ‘Hello World’. To follow up I decided it [...]
June 23, 2007 at 3:22 pm
This is Smalltalk, not SmallTalk ! Have fun.
June 23, 2007 at 8:15 pm
Ah yes, so it is. Not sure where I got that capitalization from. I’ll make sure to get it right in the future
July 23, 2007 at 11:39 am
[...] on the Spike Solution for HST - a simple Smalltalk interpreter developed in Haskell. In my previous two posts, I developed a parser that was able to parse a Smalltalk version of ‘Hello [...]