11 January 2008

The purest is "the bestest" - final attempt to Brainfuck in Haskell


Ladies and gentelmen, time for the final attempt to writing a perfect implementation of Brainfuck interpreter in Haskel.

First of all I must make a statement: all previous attempts were lame, wrong and the result sucked big time. And I shouldn't have published them to save myself embarrassment.
The code was big, ugly and was imperative rather than functional. Luckily people suggested proper ways of doing things, and I must admit that I like the result. Thanks guys!

This post is again a valid literate Haskell code, so you can just paste it into .lhs file and compile it.
Let's do it.


> module Main where
>
> import qualified Control.Exception as CE
> import Data.Char
> import Data.Word
> import System.Environment
> import System.IO
> import Text.ParserCombinators.Parsec


This version of interpreter makes use of "precompiled" and optimized Brainfuck code. Source file gets compiled to following set of operations:


> data Op a =
> Add a -- add a to current memory cell (a can be negative)
> | Move a -- move pointer a positions to the right (to the left if a negative)
> | Input -- input one character and store it in memory
> | Output -- output current memory cell to stdout
> | Loop [Op a] -- loop with body consisting of given operations list
> | End -- guard used to detect end of program or loop body
> deriving (Show, Eq)


"End" is used as guard and is being placed at the end of each loop body and of the whole program. It is perfectly possible to get rid of this guard, but I somehow found it more pleasant to use than handling empty lists in zipper.

Source code parsing is done using simple grammar written with Parsec.

The BNF form of grammar looks something like this:

program ::= instruction*
instruction ::= loop | simple
loop ::= '[' instruction* ']'
simple ::= '+' | '-' | '<' | '>' | '.' | ','

which can be written in Parsec:


> program :: Parser [Op Int]
> program = do { ins <- instruction -- instruction
> ; do { rest <- program -- followed by more instructions
> ; return (ins:rest)
> }
> <|> return [ins] -- or not - just single
> }
> <|> return [End] -- empty program / end of list
>
> instruction :: Parser (Op Int)
> instruction = simple <|> loop
>
> loop :: Parser (Op Int)
> loop = do { char '['
> ; body <- program
> ; char ']'
> ; return (Loop body)
> }
>
> simple :: Parser (Op Int)
> simple = do { ins <- oneOf "+-<>.,"
> ; let op = case ins of
> '+' -> Add 1
> '-' -> Add (-1)
> '<' -> Move (-1)
> '>' -> Move 1
> '.' -> Output
> ',' -> Input
> ; return op
> }



To make things as pure and as functional as possible both program and memory handling will be implemented using zipper. Really cool technique!
Note: it could also be implemented in the way suggested by lasts here, i.e. without "focus" field. I don't know it that would be faster or not, but I like the way with explicitly declaring focus field.


> data ListZipper a = ListZipper {
> left :: ![a], -- elements left from focus
> focus :: ! a , -- current element
> right :: ![a] -- elements right from focus
> } deriving Show
>
> move :: Int -> ListZipper a -> ListZipper a
> move (-1) (ListZipper (x:xs) y zz) = ListZipper xs x (y:zz)
> move 1 (ListZipper xx y (z:zs)) = ListZipper (y:xx) z zs
> move 0 lz = lz
> -- note: move for abs(n)>1 is done in such way, because it is faster than
> -- interpretation of a series of '>'/'<' opcodes and than trying to implement
> -- such operation using take/drop/splitAt etc.
> move n lz = if n > 0 then move (n-1) (move 1 lz)
> else move (n+1) (move (-1) lz)
>
> mkZipper :: [a] -> ListZipper a
> mkZipper x = ListZipper [] (head x) (tail x)
>
> getValue :: ListZipper a -> a
> getValue (ListZipper _ y _) = y
>
> setValue :: ListZipper a -> a -> ListZipper a
> setValue (ListZipper xx _ yy) v = ListZipper xx v yy



Time for main interpreter.

"runSequence" executes program inside ListZipper until it reaches "End" marker. Executing program transforms memory state (ListZipper Word16) into a new state with possible side-effects in I/O.


> runSequence :: ListZipper (Op Int) -> ListZipper Word16 -> IO (ListZipper Word16)
> runSequence program memory =
> if (focus program) == End
> then return memory
> else do mem' <- step (focus program) memory
> runSequence (move 1 program) mem'
>
> step :: Op Int -> ListZipper Word16 -> IO (ListZipper Word16)
> step op mem =
> case op of
> Move n -> return $ move n mem
> Add n -> return $ setValue mem ((getValue mem) + (fromIntegral n))
> Loop p -> doLoop (mkZipper p) mem
> Input -> do c <- CE.try getChar
> case c of
> Left err -> return $ setValue mem 0
> Right x -> return $ setValue mem (fromIntegral $ fromEnum x)
> Output -> do hPutChar stdout (chr $ fromEnum $ getValue mem)
> hFlush stdout
> return mem


I have enclosed loop body execution into a separate function to make it look more clear (for me at least). Algorithm is simple - if on entry to the loop current memory cell has non-zero value, the loop body is executed as if it was a separate program, and the resulting memory state is passed again to doLoop to check if another iteration should be performed.
If value of current cell is zero program is skipped and function simply returns the same memory state that it got as parameter.


> doLoop :: ListZipper (Op Int) -> ListZipper Word16 -> IO (ListZipper Word16)
> doLoop block memory = if (getValue memory) == 0 then return memory
> else do mem' <- runSequence block memory
> doLoop block mem'


Optimization step - precompiled program is transformed according to following rules:

  • sequence of Add operations is replaced with single Add with argument being sum of arguments of input, for example [Add 1, Add1] will be replaced with [Add 2], [Add 1, Add 1, Add (-1)] with [Add 1] etc

  • sequence of Move operations is replaced with single Move in similar fashion - [Move (-1), Move (-1)] is replaced with [Move (-2)] etc




> optimize :: [Op Int] -> [Op Int]
> optimize [] = []
> optimize (x:[]) = [x]
> optimize ((Add x):(Add y):xs) = optimize ((Add (x+y)):xs)
> optimize ((Move x):(Move y):xs) = optimize ((Move (x+y)):xs)
> optimize ((Loop p):xs) = (Loop (optimize p)):(optimize xs)
> optimize (x:xs) = x:(optimize xs)


Program "normalization", i.e. removal of all characters other than valid Brainfuck instructions.


> normalize :: String -> String
> normalize program = filter (\c -> elem c "+-<>[].,") program


And finally main function.


> main :: IO ()
> main = do
> args <- getArgs
> if length args < 1
> then fail "Please provide name of the program to run"
> else do
> prog <- readFile (head args)
> case (parse program "" (normalize prog)) of
> Left err -> do putStr "Parse error at "
> print err
> Right res -> do runSequence (mkZipper (optimize res)) (mkZipper (take 30000 $ repeat 0))
> return ()



Result:

time (echo 30 | ./Main ../tests/prime.bf)
Primes up to: 2 3 5 7 11 13 17 19 23 29

real 0m0.267s
user 0m0.239s
sys 0m0.012s


Yep, that's right! Cleaner, better code being ca 30% faster than semi-imperative code using unboxed types!
Note, that it can be sped up a little bit more - marking ListZipper fields as strict and passing -funbox-strict-fields to ghc drops time to 0.2s.

EDIT: original post missed one quite important line - optimize ((Loop p):xs) ... . Without it optimizations are not propagated inside loops, hence in any non-trivial program they have basically no effect.

This is corrected now.

6 comments:

Unknown said...

Some future things to improve:

Your "program" parser can use Parsec's "many".
program = do instrs <- many instruction
return (instrs ++ [End])

Your "loop" parser can use Parsec's "between":
loop = do body <- between (char '[') (char ']) program
return (Loop body)

Your "simple" parser might be more clear as something like
simple = char '+' >> return (Add 1)
<|> char '-' >> return (Add -1)
<|> etc, etc.
(You may need parens or a "do ... ; ..." in there instead; I always forget the precedence.)

runSequence could use a pattern guard to get rid of the "if".

Didn't know about "CE.try", thanks for the post!

brzozan said...

Thanks! This really does look better. I will give it a try while playing with Parsec in the next few days.

Nimish said...

when you do a filter of non BF chars, instead of making a lambda, I believe you could just pass in a partial application of the elem function, with just the string passed. I'm not sure of the syntax, though.

Justin George said...

when you do a filter of non BF chars, instead of making a lambda, I believe you could just pass in a partial application of the elem function, with just the string passed. I'm not sure of the syntax, though.

It'd be ((flip elem) "+-<>[].,")

No lambda required.

Neil Mitchell said...

> optimize ((Add x):(Add y):xs) = ...

No need for the brackets round Add, or Move on the following line.


> if length args < 1
if null args

More efficient and more compact.

brzozan said...

@Nimish and Justin:

I actually prefer lambda over partial application. I find it clearer showing what this particular piece of code is doing.