06 January 2008

Brainfuck in Haskell - optimizations part 2


In previous post we went down with execution time of prime.bf benchmark from 14s down to 4s. The profile result was:

COST CENTRE MODULE %time %alloc

step Main 85.6 42.1
findMatchingForward Main 5.0 0.5
findMatchingBackwards Main 5.0 0.5
loopEnds Main 3.0 0.6
run Main 1.5 30.2
setMem Main 0.0 18.3
nextPC Main 0.0 7.4


It does not show us directly what operation consumes most CPU ("step" is quite complex function), but we can do some guessing. First of all program is operating a lot on strings, mostly accessing random element of the string (!! operator). Strings in Haskell are a list of char, which makes this operation O(n).
Haskell provides a nice data type that could be used as replacement - ByteString.

First we need to add an import (it has to be qualified since there are name conflicts with Prelude):


> import qualified Data.ByteString as BS


Now we can replace all occurrences of String (since its used only for program) with BS.ByteString, all "length program" with "BS.length program" and "readFile" with "BS.readFile" (which reads file contents as ByteString). The only thing left is !! operator, which is not defined for ByteString.
Function for accessing random elements is called "index", but unfortunately it returns Word8 instead of Char, so we would have to change character constants in code with its numerical equivalents. To prevent this we'll write our own operator that returns ByteString element as Char:


> (!!!) :: BS.ByteString -> Int -> Char
> (!!!) bs ind = chr $ fromEnum $ BS.index bs ind
>
> infix 9 !!!


The last line makes !!! an infix operator with the same priority as string's !!.

Now we can safely replace !! with !!! in all places where we are selecting program character ("program !! pos" with "program !!! pos", "program st !! pc st" with "program st !!! pc st") and do some benchmarking.

> ghc --make -prof -auto-all -O2 Main.lhs
[1 of 1] Compiling Main ( Main.lhs, Main.o )
Linking Main ...
> time (echo 10 | ./Main +RTS -p -RTS prime.bf)
Primes up to: 2 3 5 7

real 0m0.694s
user 0m0.517s
sys 0m0.020s
>

Not bad - almost 6 times faster.

There is a similar situation with loop_ends and with memory - we often access random elements of a list. Plus list elements are boxed in Haskell objects, which generates additional cost when operating on them.
loop_ends is the easiest to improve, since the array is immutable, so we can replace it with Haskell's type for immutable arrays of unboxed objects (UArray), which additionally has O(1) access time for its elements.

Necessary changes:


> import Data.Array.Unboxed

> data BFState = BFState {
> program :: BS.ByteString, -- program being interpreted
> memory :: [Word16], -- memory
> pc :: Int, -- current program counter
> pos :: Int, -- current pointer position
>
> prog_len :: Int, -- cached program length
> loop_ends :: UArray Int Int -- cached loop ends
> }


> initState :: BS.ByteString -> Int -> BFState
> initState program memSize = BFState program (take memSize $ repeat 0) 0 0 (BS.length program)
> (listArray (0, (BS.length program - 1)) $ loopEnds program 0)


Plus change of all "(loop_ends st) !! (pc st)" with "(loop_ends st) ! (pc st)", since UArray uses ! to access elements.

As simple as that.

> ghc --make -prof -auto-all -O2 Main.lhs
[1 of 1] Compiling Main ( Main.lhs, Main.o )
Linking Main ...
> time (echo 10 | ./Main +RTS -p -RTS prime.bf)
Primes up to: 2 3 5 7

real 0m0.137s
user 0m0.101s
sys 0m0.012s
>

Wow! Another 5 times faster! Profile:

COST CENTRE MODULE %time %alloc

step Main 40.0 41.1
setMem Main 40.0 15.3
getMem Main 20.0 0.0
run Main 0.0 32.4
nextPC Main 0.0 6.2
!!! Main 0.0 3.3

As you can see getMem and setMem operations are now taking ca 60% of program run time, and we know exactly why - accessing random elements of a list.
Unfortunately memory is mutable, so we cannot replace it as easily as we did with loop_ends. Haskell does have a proper type, namely mutable arrays of unboxed elements (IOUArray) and that's what we will use next time.

No comments: