my sabbatical year diary: December 2007

One of the biggest problems I have learning Haskell is working with state and with I/O. Playing with simple tasks that involve simple transformations of one value to another or generating some sets of values (factorial, sorting, fibonacci, simple grep) is fun, but in reality almost every program needs some internal state to be stored and modified as the program goes.
To exercise state handling I came up with an idea - why not implement an interpreter of some simple language, like Brainfuck? It has only a few instructions, state is simple to describe, and it's fun.

This post (and a few following) have a cleaned-up and commented version of my experiments.

The post is actually a valid literate Haskell source, so you can just save it as Main.lhs and either run using runghc or compile with ghc, and then run it with "runghc Main.lhs program.bf" or "Main program.bf".

For a collection of Brainfuck programs to execute I recommend this place: http://esoteric.sange.fi/brainfuck/bf-source/prog/

Let's start with module declaration and a few imports:



> module Main where
>
> import Control.Exception
> import Data.Word
> import Data.Char
> import System.Environment
> import System.IO

Current state of the Brainfuck interpreter consists of four things:

Brainfuck program being executed

current program counter position

contents of the data memory (usually 30000 8-bit wide cells, but some programs require 16 bits to run)

current memory pointer position

Such state can be wrapped in Haskell structure:



> data BFState = BFState {
>   program     :: String,      -- program being interpreted
>   memory      :: [Word16],    -- memory
>   pc          :: Int,         -- current program counter
>   pos         :: Int          -- current pointer position
> }

I am using Word16 as memory cell since I want to run pi16.bf as a benchmark, and it requires 16-bit words.

Initialization of BFState structure for given program and given size of memory (typically 30000):



> initState :: String -> Int -> BFState
> initState program memSize = BFState program (take memSize $ repeat 0) 0 0

Brainfuck program ends when program counter (pc) reaches end of program.



> isEnd :: BFState -> Bool
> isEnd st  = (pc st) >= length (program st)

Time to write the function that will interpret single program instruction and execute it.
All instructions transform current state into a new state with some of its data updated, for example pc moved to the next instruction, updated memory state etc. Normally type of such function could be BFState -> BFState (simple transformation), but unfortunately some instructions (namely . and ,) have a side-effect - they perform I/O operations. Because of that our function needs to have type BFState -> IO BFState, meaning that it transforms BFState into a new BFState, but with possible side-effects.



> step :: BFState -> IO BFState
> step st   =
>   case (program st !! pc st) of

First four instructions are trivial.



>       '+' -> return st { memory = setMem st (getMem st + 1), pc = nextPC st }
>       '-' -> return st { memory = setMem st (getMem st - 1), pc = nextPC st }
>       '<' -> return st { pos = (pos st) - 1, pc = nextPC st }
>       '>' -> return st { pos = (pos st) + 1, pc = nextPC st }

'[' checks if current memory cell is 0, and if it is program jumps to the instruction after matching ']'. If cell is non-zero program simply enters loop body.



>       '[' -> return st { pc = pc' } where
>               pc' = if getMem st == 0 then findMatchingForward (program st) (pc st) + 1
>                                       else nextPC st

']' simply jumps to its corresponding '['.



>       ']' -> return st { pc = pc' } where
>               pc' = findMatchingBackwards (program st) (pc st)

'.' outputs current memory cell as character to standard output. Memory cell type (Word16) has to be converted to chr - it is done using chr (fromEnum x).



>       '.' -> do hPutChar stdout (chr (fromEnum $ getMem st))
>                 hFlush stdout
>                 return st { pc = nextPC st}

',' reads one character from standard input and puts it to current memory cell. If EOF is encountered 0 is written.
Normally getChar would throw exception on EOF, but System.IO provides a nice function called try, that translates result of getChar to "Right char" on success or "Left exception" if exception occurs.



>       ',' -> do c <- try getChar
>                 let val   = case c of
>                               Right a -> fromIntegral (fromEnum a)
>                               Left e  -> 0
>                   in return st { memory = setMem st val, pc = nextPC st }

All characters that are not Brainfuck instructions should just be ignored.



>       otherwise  -> return $ st { pc = nextPC st }

code above makes use of a few utility functions - nextPC, getMem and setMem.

nextPC - advance PC to the next instruction



> nextPC :: BFState -> Int
> nextPC st = (pc st) + 1

getMem - return value from memory cell pointed to by pos



> getMem :: BFState -> Word16
> getMem st = (memory st) !! (pos st)

setMem - transform memory so that memory cell pointed to by pos gets a new value.
Lists are immutable, so a copy of memory is generated.



> setMem :: BFState -> Word16 -> [Word16]
> setMem st value = take (pos st) (memory st) ++ [value] ++ drop (pos st + 1) (memory st)

'[' and ']' implementation requires searching for matching parenthesis to provide
loop behaviour. Functions findMatchingForward and findMatchingBackwards do that.

Both functions take program and starting position in program (index of character that opens the loop) as arguments and return position in program of the corresponding loop end.



> findMatchingForward :: String -> Int -> Int
> findMatchingForward program pos = findMatchingForward_ program pos 0 where
>       findMatchingForward_ program pos level =
>               case program !! pos of
>                   '[' -> findMatchingForward_ program (pos+1) (level+1)
>                   ']' -> if level == 1 then pos
>                                        else findMatchingForward_ program (pos+1) (level-1)
>                   otherwise -> findMatchingForward_ program (pos+1) level
>
> findMatchingBackwards :: String -> Int -> Int
> findMatchingBackwards program pos = findMatchingBackwards_ program pos 0 where
>       findMatchingBackwards_ program pos level =
>               case program !! pos of
>                   ']' -> findMatchingBackwards_ program (pos-1) (level+1)
>                   '[' -> if level == 1 then pos
>                                        else findMatchingBackwards_ program (pos-1) (level-1)
>                   otherwise -> findMatchingBackwards_ program (pos-1) level

Interpreter loop is very simple - if we reached end of the program just return. If not, execute one step with current interpreter state, which gets us a new updated state st', and call run again with st'.



> run st = if isEnd st then return ()
>                      else do st' <- step st
>                              run st'

Finally time for the main function - check for program arguments, if there are any treat first of them as name of the file with program to read. Then create initial BFState and pass it to run.



> main :: IO ()
> main  = do
>       args <- getArgs
>       if length args == 0 then fail "Please provide name of the program to execute"
>                           else do program <- readFile (head args)
>                                  run (initState program 30000)

That's it. Full Brainfuck interpreter in pure Haskell.

The bad news is that it's extremely slow. Even compiled with -O2 it is a few thousand times slower than interpreter written in C. Can something be done about it? It sure can, but I'll do that in the next post.

my sabbatical year diary

30 December 2007

Brainfuck interpreter in Haskell

28 December 2007

Blogging attempt #4

Blog Archive

Labels