HST: Handling more message syntax

August 17, 2007

Unfortunately I’ve been busy since deciding to place HST on Google Code, so have not had time for major enhancements. Those ‘enhancements’ I have attempted, proceeded in a rather haphazard fashion and lack direction. This post will help refocus my attention on smaller goals and more disciplined testing.

After adding testing to the basic parser, it is now time to expand the parser’s functionality, adding the ability to handle more features of Smalltalk. I have chosen to add support for more complex message sends. I wish to support multiple keyword/primary pairs so that expressions such as prim a: 'arg1' b: 'arg2' are able to be parsed.

In adding this functionality I take small steps and ensure that testing is kept up to date with the work I carry out. Although at the end of this development the execution engine can not compile, I believe the parser is futher forward than with my previous, rather random, additions.

Starting Point

Currently the parser is set up to handle just a single keyword message. The parser is defined as:

keywordMessage = do
  k <- keyword
  p <- primary
  return $ KeywordMessage k p

I’m going to adapt this to handle an arbitrary number of keywords with arguments. But, before I can begin this I need to look at the data structure it returns.

data Message = KeywordMessage Keyword Primary

This is no longer approrpiate, as there may be many keywords, each with a Primary construct. The easist way to meet this requirement is to use a list of Keywords and Primarys:

data Message = KeywordMessage [Keyword] [Primary]

To aid in making this change, I going to set up a QuickCheck property that I can call to test the keywordMessage construct.

checkMessageParser kmsg =
  case (parse keywordMessage "" (render $ message kmsg)) of
    Left _ -> False
    Right m -> kmsg == m
  where types = kmsg :: Message
runMessageChecks = quickCheck checkMessageParser

To run this test the pretty printer needs to be modified so it can handle the new data type. Previously its implementation was trival, just specifying that the keyword was beside the primary. Now that KeywordMessage involves lists the implementation is not as simple.

In writing a method like this I like to think in terms of data types. Effectively, what I’m after is a function of type [Keyword]->[Primary]->Doc. Using zip I can achieve [Keyword]->[Primary]->[(Keyword,Primary)]. Next I can use map to apply a method similar to the current, trivial, implementation to all elements of the list. This gives me [(Keyword,Primary)]->[Doc]. Finally, by folding (using foldr) the function <+> across this list I can finish with [Doc]->Doc. Being able to reason like this is one of my favourite features of Haskell.

In code this becomes

message (KeywordMessage k p) = foldr (<+>) empty
  (map (\\(k,p) -> keyword k <+> primary p) (zip k p))

It’s also important to modify the instance of Arbitrary Message so that only valid messages will be generated for QuickCheck tests. It is important that the length of the two lists in the KeywordMessage constructor are both the same length and have at least one element. If left to the default list instance of Arbitrary those two properties are not guarenteed.

instance Arbitrary Message where
    arbitrary = sized (\\n -> liftM2
        (replicateM (n+1) arbitrary)
        (replicateM (n+1) arbitrary))

Now we have enough to run QuickCheck on the current message parser. Not suprisingly, it fails when given a message with more than one keyword/primary pair.

> runMessageChecks
Falsifiable, after 0 tests:
KeywordMessage [Keyword "Ao",Keyword "Gy"]
  [PrimaryLiteral (StringLiteral "z"),PrimaryIdentifier (Identifier "lT")]

Passing the Tests

The first modification I’m going to make to the code is to extract the single keyword/primary pair parser. It will return a tuple of type (Keyword,Primary). It’s also important to add an extra call to spaces, to deal with trailing space after the call to primary.

keywordPrimary = do
    k <- keyword
    p <- primary
    return (k,p)

Applying the parser combinator many1 will ensure that at least one keyword/primary pair is present, and will return a list of tuples. That list of tuples can be transformed to a tuple of two lists using unzip. So, [(Keyword,Primary)] -> ([Keyword],[Primary]). Finally this tuple is passed into uncurry, which will pass both elements of the tuple as arguments to the KeywordMessage constructor. The return calls are important to ensure the results remain in the parser monad after calculating the result using pure functions.

keywordMessage = many1 keywordPrimary >>=
  return . unzip >>= return . uncurry KeywordMessage

Running through QuickCheck

> runMessageChecks
OK, passed 100 tests
> runChecks
OK, passed 100 tests.


This posts hasn’t demonstrated anything ground breaking. However, I have taken a few things out of the development work listed above.

The benefits of a good, static type system are showing. Functions such as keywordMessage can take a bit of work to compile correctly, but once finished they generally work first time. Types are also useful when thinking about an algorithm for a function as described in the pretty printing for messages, however this approach is not limited to statically typed languages.

I have also learned that adding small amounts of functionality with good testing is good for my confidence. The simple change added here is much better code than I initially developed and I am much more confident that it works.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: