my sabbatical year diary: Brainfuck in Haskell

In the previous post we wrote an extremely slow, but working, Brainfuck interpreter in Haskell. Now it's time to make it a little bit faster. (Side note: this post will not be a valid literate Haskell program. It describes changes to the origital + command line operations.)

GHC gives us a nice profiler that makes it easy to spot most of the performance issues, and that's what we will use.
For benchmarking I'll use prime.bf finding prime numbers up to, let's say, 10.


> ghc --make -prof -auto-all -O2 Main.lhs 
[1 of 1] Compiling Main             ( Main.lhs, Main.o )
Linking Main ...
> time (echo 10 | ./Main +RTS -p -RTS prime.bf)
Primes up to: 2 3 5 7 

real    0m14.244s
user    0m14.010s
sys     0m0.092s
>

Profiling results are in Main.prof. Here's a snippet that shows our hot spots:


COST CENTRE                    MODULE               %time %alloc

findMatchingBackwards          Main                  32.3   13.0
isEnd                          Main                  30.9    0.0
step                           Main                  21.3   33.8
findMatchingForward            Main                  14.9    5.6
setMem                         Main                   0.4   19.5
run                            Main                   0.1   19.7
nextPC                         Main                   0.0    8.0

It seems that the easiest thing to get rid of is isEnd function - each time it is called it calculates length of the program (string => list of char => length's complexity is O(n)). Program does not change over time, so we can simply cache program length in BFState and compare pc against this cached value.
Changes are needed in definition of BFState:



> data BFState = BFState {
>   program     :: String,      -- program being interpreted
>   memory      :: [Word16],    -- memory
>   pc          :: Int,         -- current program counter
>   pos         :: Int,         -- current pointer position
>
>   prog_len    :: Int          -- cached program length
> }

in initState function:



> initState :: String -> Int -> BFState
> initState program memSize = BFState program (take memSize $ repeat 0) 0 0 (length program)

and in isEnd itself:



> isEnd :: BFState -> Bool
> isEnd st  = (pc st) >= (prog_len st)

Result:


> ghc --make -prof -auto-all -O2 Main.lhs 
[1 of 1] Compiling Main             ( Main.lhs, Main.o )
Linking Main ...
> time (echo 10 | ./Main +RTS -p -RTS prime.bf)
Primes up to: 2 3 5 7 

real    0m9.387s
user    0m9.234s
sys     0m0.058s
>

So it worked as expected - ca 30% performance gain. Current performance bottlenecks:


COST CENTRE                    MODULE               %time %alloc

findMatchingBackwards          Main                  47.5   11.4
step                           Main                  26.8   34.3
findMatchingForward            Main                  24.4    4.9
setMem                         Main                   0.9   17.0
run                            Main                   0.4   25.1
nextPC                         Main                   0.0    7.0

70% of the time is spent looking for matching '[' and ']' in loops.
Situation here is similar to isEnd - program does not change over time, so it should not be a problem to cache somehow findMatching results.

The simplest way is to create a table of the same length as program, where on a position corresponding to '[' we will have index of matching ']', and on position corresponding to ']' we will have index of matching '[', for example table for program "++[>+<-]" will look like [0, 0, 7, 0, 0, 0, 0, 2] (values for characters other than '[' and ']' do not matter, so I put 0 there).

Function generating such table could look like this:



> loopEnds :: String -> Int -> [Int]
> loopEnds program pos =
>   if (pos >= length program) then []
>   else end:(loopEnds program (pos+1)) where
>       end = case (program !! pos) of
>               '[' -> findMatchingForward program pos
>               ']' -> findMatchingBackwards program pos
>               otherwise -> 0

Of course we need also changes in BFState:



> data BFState = BFState {
>   program     :: String,      -- program being interpreted
>   memory      :: [Word16],    -- memory
>   pc          :: Int,         -- current program counter
>   pos         :: Int,         -- current pointer position
>
>   prog_len    :: Int,         -- cached program length
>   loop_ends   :: [Int]        -- cached loop ends
> }

initState:



> initState :: String -> Int -> BFState
> initState program memSize = BFState program (take memSize $ repeat 0) 0 0 (length program) (loopEnds program 0)

and in implementation of '[' and ']' in step:



>       '[' -> return st { pc = pc' } where
>               pc' = if getMem st == 0 then ((loop_ends st) !! (pc st)) + 1
>                                       else nextPC st



>       ']' -> return st { pc = pc' } where
>               pc' = (loop_ends st) !! (pc st)

Results:


> ghc --make -prof -auto-all -O2 Main.lhs 
[1 of 1] Compiling Main             ( Main.lhs, Main.o )
Linking Main ...
> time (echo 10 | ./Main +RTS -p -RTS prime.bf)
Primes up to: 2 3 5 7 

real    0m4.090s
user    0m4.003s
sys     0m0.032s
>

With current profile:


COST CENTRE                    MODULE               %time %alloc

step                           Main                  85.6   42.1
findMatchingForward            Main                   5.0    0.5
findMatchingBackwards          Main                   5.0    0.5
loopEnds                       Main                   3.0    0.6
run                            Main                   1.5   30.2
setMem                         Main                   0.0   18.3
nextPC                         Main                   0.0    7.4

Not bad. 14s down to 4s. Still way slower than C version, but getting better.

In the next part - some more optimizations, this time mostly on data types used.

my sabbatical year diary

02 January 2008

Brainfuck in Haskell - time for some optimization

No comments:

Blog Archive

Labels