One of the biggest problems I have learning Haskell is working with state and with I/O. Playing with simple tasks that involve simple transformations of one value to another or generating some sets of values (factorial, sorting, fibonacci, simple grep) is fun, but in reality almost every program needs some internal state to be stored and modified as the program goes.
To exercise state handling I came up with an idea - why not implement an interpreter of some simple language, like Brainfuck? It has only a few instructions, state is simple to describe, and it's fun.
This post (and a few following) have a cleaned-up and commented version of my experiments.
The post is actually a valid literate Haskell source, so you can just save it as Main.lhs and either run using runghc or compile with ghc, and then run it with "runghc Main.lhs program.bf" or "Main program.bf".
For a collection of Brainfuck programs to execute I recommend this place: http://esoteric.sange.fi/brainfuck/bf-source/prog/
Let's start with module declaration and a few imports:
Current state of the Brainfuck interpreter consists of four things:
Such state can be wrapped in Haskell structure:
I am using Word16 as memory cell since I want to run pi16.bf as a benchmark, and it requires 16-bit words.
Initialization of BFState structure for given program and given size of memory (typically 30000):
Brainfuck program ends when program counter (pc) reaches end of program.
Time to write the function that will interpret single program instruction and execute it.
All instructions transform current state into a new state with some of its data updated, for example pc moved to the next instruction, updated memory state etc. Normally type of such function could be BFState -> BFState (simple transformation), but unfortunately some instructions (namely . and ,) have a side-effect - they perform I/O operations. Because of that our function needs to have type BFState -> IO BFState, meaning that it transforms BFState into a new BFState, but with possible side-effects.
First four instructions are trivial.
'[' checks if current memory cell is 0, and if it is program jumps to the instruction after matching ']'. If cell is non-zero program simply enters loop body.
']' simply jumps to its corresponding '['.
'.' outputs current memory cell as character to standard output. Memory cell type (Word16) has to be converted to chr - it is done using chr (fromEnum x).
',' reads one character from standard input and puts it to current memory cell. If EOF is encountered 0 is written.
Normally getChar would throw exception on EOF, but System.IO provides a nice function called try, that translates result of getChar to "Right char" on success or "Left exception" if exception occurs.
All characters that are not Brainfuck instructions should just be ignored.
code above makes use of a few utility functions - nextPC, getMem and setMem.
nextPC - advance PC to the next instruction
getMem - return value from memory cell pointed to by pos
setMem - transform memory so that memory cell pointed to by pos gets a new value.
Lists are immutable, so a copy of memory is generated.
'[' and ']' implementation requires searching for matching parenthesis to provide
loop behaviour. Functions findMatchingForward and findMatchingBackwards do that.
Both functions take program and starting position in program (index of character that opens the loop) as arguments and return position in program of the corresponding loop end.
Interpreter loop is very simple - if we reached end of the program just return. If not, execute one step with current interpreter state, which gets us a new updated state st', and call run again with st'.
Finally time for the main function - check for program arguments, if there are any treat first of them as name of the file with program to read. Then create initial BFState and pass it to run.
That's it. Full Brainfuck interpreter in pure Haskell.
The bad news is that it's extremely slow. Even compiled with -O2 it is a few thousand times slower than interpreter written in C. Can something be done about it? It sure can, but I'll do that in the next post.
To exercise state handling I came up with an idea - why not implement an interpreter of some simple language, like Brainfuck? It has only a few instructions, state is simple to describe, and it's fun.
This post (and a few following) have a cleaned-up and commented version of my experiments.
The post is actually a valid literate Haskell source, so you can just save it as Main.lhs and either run using runghc or compile with ghc, and then run it with "runghc Main.lhs program.bf" or "Main program.bf".
For a collection of Brainfuck programs to execute I recommend this place: http://esoteric.sange.fi/brainfuck/bf-source/prog/
Let's start with module declaration and a few imports:
> module Main where
>
> import Control.Exception
> import Data.Word
> import Data.Char
> import System.Environment
> import System.IO
Current state of the Brainfuck interpreter consists of four things:
- Brainfuck program being executed
- current program counter position
- contents of the data memory (usually 30000 8-bit wide cells, but some programs require 16 bits to run)
- current memory pointer position
Such state can be wrapped in Haskell structure:
> data BFState = BFState {
> program :: String, -- program being interpreted
> memory :: [Word16], -- memory
> pc :: Int, -- current program counter
> pos :: Int -- current pointer position
> }
I am using Word16 as memory cell since I want to run pi16.bf as a benchmark, and it requires 16-bit words.
Initialization of BFState structure for given program and given size of memory (typically 30000):
> initState :: String -> Int -> BFState
> initState program memSize = BFState program (take memSize $ repeat 0) 0 0
Brainfuck program ends when program counter (pc) reaches end of program.
> isEnd :: BFState -> Bool
> isEnd st = (pc st) >= length (program st)
Time to write the function that will interpret single program instruction and execute it.
All instructions transform current state into a new state with some of its data updated, for example pc moved to the next instruction, updated memory state etc. Normally type of such function could be BFState -> BFState (simple transformation), but unfortunately some instructions (namely . and ,) have a side-effect - they perform I/O operations. Because of that our function needs to have type BFState -> IO BFState, meaning that it transforms BFState into a new BFState, but with possible side-effects.
> step :: BFState -> IO BFState
> step st =
> case (program st !! pc st) of
First four instructions are trivial.
> '+' -> return st { memory = setMem st (getMem st + 1), pc = nextPC st }
> '-' -> return st { memory = setMem st (getMem st - 1), pc = nextPC st }
> '<' -> return st { pos = (pos st) - 1, pc = nextPC st }
> '>' -> return st { pos = (pos st) + 1, pc = nextPC st }
'[' checks if current memory cell is 0, and if it is program jumps to the instruction after matching ']'. If cell is non-zero program simply enters loop body.
> '[' -> return st { pc = pc' } where
> pc' = if getMem st == 0 then findMatchingForward (program st) (pc st) + 1
> else nextPC st
']' simply jumps to its corresponding '['.
> ']' -> return st { pc = pc' } where
> pc' = findMatchingBackwards (program st) (pc st)
'.' outputs current memory cell as character to standard output. Memory cell type (Word16) has to be converted to chr - it is done using chr (fromEnum x).
> '.' -> do hPutChar stdout (chr (fromEnum $ getMem st))
> hFlush stdout
> return st { pc = nextPC st}
',' reads one character from standard input and puts it to current memory cell. If EOF is encountered 0 is written.
Normally getChar would throw exception on EOF, but System.IO provides a nice function called try, that translates result of getChar to "Right char" on success or "Left exception" if exception occurs.
> ',' -> do c <- try getChar
> let val = case c of
> Right a -> fromIntegral (fromEnum a)
> Left e -> 0
> in return st { memory = setMem st val, pc = nextPC st }
All characters that are not Brainfuck instructions should just be ignored.
> otherwise -> return $ st { pc = nextPC st }
code above makes use of a few utility functions - nextPC, getMem and setMem.
nextPC - advance PC to the next instruction
> nextPC :: BFState -> Int
> nextPC st = (pc st) + 1
getMem - return value from memory cell pointed to by pos
> getMem :: BFState -> Word16
> getMem st = (memory st) !! (pos st)
setMem - transform memory so that memory cell pointed to by pos gets a new value.
Lists are immutable, so a copy of memory is generated.
> setMem :: BFState -> Word16 -> [Word16]
> setMem st value = take (pos st) (memory st) ++ [value] ++ drop (pos st + 1) (memory st)
'[' and ']' implementation requires searching for matching parenthesis to provide
loop behaviour. Functions findMatchingForward and findMatchingBackwards do that.
Both functions take program and starting position in program (index of character that opens the loop) as arguments and return position in program of the corresponding loop end.
> findMatchingForward :: String -> Int -> Int
> findMatchingForward program pos = findMatchingForward_ program pos 0 where
> findMatchingForward_ program pos level =
> case program !! pos of
> '[' -> findMatchingForward_ program (pos+1) (level+1)
> ']' -> if level == 1 then pos
> else findMatchingForward_ program (pos+1) (level-1)
> otherwise -> findMatchingForward_ program (pos+1) level
>
> findMatchingBackwards :: String -> Int -> Int
> findMatchingBackwards program pos = findMatchingBackwards_ program pos 0 where
> findMatchingBackwards_ program pos level =
> case program !! pos of
> ']' -> findMatchingBackwards_ program (pos-1) (level+1)
> '[' -> if level == 1 then pos
> else findMatchingBackwards_ program (pos-1) (level-1)
> otherwise -> findMatchingBackwards_ program (pos-1) level
Interpreter loop is very simple - if we reached end of the program just return. If not, execute one step with current interpreter state, which gets us a new updated state st', and call run again with st'.
> run st = if isEnd st then return ()
> else do st' <- step st
> run st'
Finally time for the main function - check for program arguments, if there are any treat first of them as name of the file with program to read. Then create initial BFState and pass it to run.
> main :: IO ()
> main = do
> args <- getArgs
> if length args == 0 then fail "Please provide name of the program to execute"
> else do program <- readFile (head args)
> run (initState program 30000)
That's it. Full Brainfuck interpreter in pure Haskell.
The bad news is that it's extremely slow. Even compiled with -O2 it is a few thousand times slower than interpreter written in C. Can something be done about it? It sure can, but I'll do that in the next post.
4 comments:
First of all, I apologize for my english, but that's an interesting topic :)
I think you should use a zipper in order to handle the memory - and parse the code before the execution ^_^
I can't find a good link for zipper list, but the basic idea looks like :
> data Zipper a = Zipper [a] [a]
> get (Zipper _ (y:_)) = y
> update (Zipper xs (_:ys)) newY = Zipper xs (newY:ys)
> incrPointer (Zipper xs (y:ys)) = Zipper (y:xs) ys
> decrPointer (Zipper (x:xs) ys) = Zipper xs (x:ys)
> listToZipper li = Zipper [] li
> zipperToList (Zipper xs ys) = (reverse xs) ++ ys
The zipper offer you a O(1) access to the cell (while (!!) is O(N)).
Parsing the code before you a more handy way to handle the loops.
> type BFCode = [BFWord]
> data BFWord = IncrValue
> | DecrValue
> | IncrPointer
> | DecrPointer
> | Print
> | Read
> | Loop BFCode
As such, most of the remaining functions are just one-one with the zipper's functions ^_^
I wonder how this can influence on your benchmarks - because you're actually solving the performance problem with other tools... so please continue your experiments :)
It looks like someone has already told you about the zipper on reddit a way better than me :] .
I was a bit curious about how to handle the side-effect of brainfuck properly. So here's my version :
http://hpaste.org/4874
$ ghc brainfuck.hs -o bf
$ time (echo 10 | ./bf PRIME.BF)
Primes up to: 2 3 5 7
(; echo 10 | ./bf PRIME.BF; ) 0,02s user 0,00s system 84% cpu 0,019 total
$ time (echo 20 | ./bf PRIME.BF)
Primes up to: 2 3 5 7 11 13 17 19
(; echo 20 | ./bf PRIME.BF; ) 0,13s user 0,00s system 95% cpu 0,134 total
$ time (echo 30 | ./bf PRIME.BF)
Primes up to: 2 3 5 7 11 13 17 19 23 29
(; echo 30 | ./bf PRIME.BF; ) 0,45s user 0,02s system 99% cpu 0,472 total
I really like that kind of articles, thanks for sharing your experiments because that's always a source of inspiration :)
Hi lasts!
Thanks for comments. Your english is not bad at all, especially that my mother's tongue is polish :)
I did some experiments (they yet have to be posted on this blog) with zipper, but it turned out to be slower than using IOUArray, probably mostly due to IOUArray items being unboxed.
It is still a great technique and definitely worth remembering.
As for preparsing, I made some esperiment (also not part of the blog yet) with preparsing and compacting code (for example "+++++" is translated to Add 5, "<<<" translated to "Move -3" etc) and using pattern matching. To my disappointment it turned out to be twice slower than current solution (with ByteString for program and parsing char-by-char).
Sometimes Haskell gives some unintuitive results...
Amiable brief and this mail helped me alot in my college assignement. Thank you on your information.
Post a Comment