19 February 2008

Playing with monad transformers

Yesterday someone posted a link to a great monad transformers tutorial. It's incredible. I think I finally start to understand (or at least being able to use) it.

As part of testing my fresh knowledge I've written a simple virtual machine for a Forth-like language. Code (as usual - Literate Haskell, so you can just put it into .lhs file and run it) below.
Enjoy!



> module Main where
>
> import Control.Monad.State
> import Control.Monad.Reader
> import Data.Char
> import Data.Maybe
> import System.IO
>
> import qualified Data.Map as Map

Data type Op defines all primitive operations available in our language.


> data Op =
> Number Int -- push number on stack
> | Plus -- add two elements on top of stack
> | Minus -- subtract
> | Mul -- multiplicate
> | Div -- divide
> | Out -- write single number from stack to stdout
> | Emit -- emit character with code from stack to stdout
> | Dup -- duplicate item on top of stack
> | Drop -- drop top-most item
> | Call String -- call function with given name
> | GetI -- get current loop counter and push on stack
> | Loop [Op] -- execute loop body; max and min must be on stack
> | If [Op] [Op] -- if statement with "then" and "else" block
> | Equal -- push 1 if top-most items are equal, 0 otherwise
> | Less -- push 1 if top-most item is greater than item below it
> | Greater -- push 1 if top-most item is smaller than item below it
> deriving Show

Words is a dictionary mapping function names to their code.


> type Words = Map.Map String [Op]

ForthState contains current interpreter state. It consists of data stack and of loop counter. Currently no nested loops are supported.


> data ForthState = ForthState {
> stack :: [Int],
> counter :: Int
> } deriving Show

Interpreter state is defined using two monad transformers - StateT responsible for state management (stack, counter), and ReaderT providing read-only environment, in this case dictionary of user-defined words. IO is used as internal monad since program is supposed to perform I/O operations.
Computation does not return any useful results, hence return type of whole computation is ().


> type Forth = ReaderT Words (StateT ForthState IO) ()
>
> initState = ForthState [] 0

runForth function performs execution of given program in given environment (defined words) starting from given state.


> runForth :: Words -> ForthState -> [Op] -> IO ((), ForthState)
> runForth env st program = runStateT (runReaderT (execSequence program) env) st

Execution of sequence of operations is realized by simple (monadic) mapping of exec function which interprets single operation over whole program. Mapping variant used ignores return values of subsequent exec invocations since it does not return any useful data anyway.


> execSequence program = mapM_ exec program

Utility functions for stack operations.


> push n = do state <- get
> put $ state { stack = n:(stack state) }
>
> pop = do state <- get
> case stack state of
> (x:xs) -> do put $ state { stack = xs }
> return x

And exec implementation for different operations.


> -- ( -- n )
> exec (Number n) = push n
>
> -- ( n -- )
> exec Drop = pop >> return () -- pop and ignore returned value
>
> -- ( n -- n n )
> exec Dup = do x <- pop
> push x
> push x
>
> -- ( y x -- y+x )
> exec Plus = do x <- pop
> y <- pop
> push (y + x)
>
> -- ( y x -- y-x )
> exec Minus = do x <- pop
> y <- pop
> push (y - x)
>
> -- ( y x -- y*x )
> exec Mul = do x <- pop
> y <- pop
> push (y * x)
>
> -- ( y x -- y/x )
> exec Div = do x <- pop
> y <- pop
> push (y `div` x)
>
> -- ( y x -- y==x )
> exec Equal = do x <- pop
> y <- pop
> if y == x then push 1 else push 0
>
> -- ( y x -- y> exec Less = do x <- pop
> y <- pop
> if y < x then push 1 else push 0
>
> -- ( y x -- y>x )
> exec Greater = do x <- pop
> y <- pop
> if y > x then push 1 else push 0
>
> -- ( x -- )
> exec Out = do x <- pop
> liftIO $ (putStr (show x) >> putStr " " >> hFlush stdout)
>
> -- ( x -- )
> exec Emit = do x <- pop
> liftIO $ (putChar (chr x) >> hFlush stdout)
>
> -- stack effect depends on invoked function
> exec (Call fn) = do body <- asks (fromJust . Map.lookup fn)
> execSequence body
>
> -- ( -- I )
> exec GetI = do state <- get
> push $ counter state
>
> -- ( n m -- .. ) - n - high loop bound, m - low loop bound
> exec (Loop b) = do low <- pop
> high <- pop
> if low < high then mapM_ (doOnce b) [low .. high-1] else return () where
> doOnce program c = do state <- get
> put $ state { counter = c }
> execSequence program
>
> -- ( x -- .. ) - if x != 0 then t else e
> exec (If t e) = do x <- pop
> if x /= 0 then execSequence t else execSequence e

Time for a sample program to be executed. Program will calculate factorials of numbers from 1 to 7 and output them as lines with pairs number - factorial.

Environment contains definition of a factorial function that calculates factorial of an item on top of stack, that is used by the main program to perform calculations.


> env = Map.fromList [ ("factorial",
> [ Dup
> , Number 1
> , Greater -- if n > 0
> , If [ Dup
> , Number 1
> , Minus
> , Call "factorial" -- factorial (n-1)
> , Mul -- *
> ]
> [ Drop
> , Number 1
> ] -- else 1
> ])
> ]
> sample = [ Number 8
> , Number 1
> , Loop [ GetI
> , Dup
> , Out
> , Call "factorial"
> , Out
> , Number 10
> , Emit
> ]
> ]

> main = do a <- runForth env initState sample
> return ()

31 January 2008

Applications menu in ion3

After some time spent using KDE I recently switched back to ion3. One of the first things I realized after the switch was lack of KDE menu with all installed apps. Actually lack of any menu with all installed apps. I am too old to remember all the exec names I have installed, and way too lazy to add those manually to the ion menu, so I've written a script that parses all .desktop files from /usr/share/applications and adds (more or less) all the apps to the menu which can be invoked by Meta1-Esc (for me: alt-esc).

You can download the script from here. All you have to do is to save it in your ~/.ion3 directory and add dopath("cfg_apps") to cfg_ion.lua.
Menu is created during ion start, so if you install new apps you either have to logout/login, or restart ion for changes to be visible.

Note: the script is quite lame, uses hardcoded list of categories, does not handle some Exec entries properly etc. I will be cleaning those up soon.

Enjoy.

15 January 2008

Brainfuck - it's the last one, I promise


OK, I promise - this is the last version. What changed since last time:

  • program is now stored in a list of [Op Word16] instead of the zipper - zipper is good for memory, where pointer moves in both directions, but in case of program that only goes in one direction it has an unnecessary overhead

  • parser clean-up + optimization of sequence of similar operation ("++++", "---", ">>>", "<<<") built-in into parser

  • precompiled code optimization based on common loop patterns such as setting current cell to 0, adding current cell to another etc


Result - ca 5x speed-up on prime.bf. Other benchmarks also get some speed.

Again, post is a literate Haskell source. Enjoy.



> module Main where
>
> import qualified Control.Exception as CE
> import Control.Monad
> import Data.Char
> import Data.Word
> import System.Environment
> import System.IO
> import Text.ParserCombinators.Parsec

> import Debug.Trace


This version of interpreter makes use of "precompiled" and optimized Brainfuck code. Source file gets compiled to following set of operations:


> data Op a =
> Add a -- add a to current memory cell (a can be negative)
> | Move a -- move pointer a positions to the right (to the left if a negative)
> | Input -- input one character and store it in memory
> | Output -- output current memory cell to stdout
> | Loop [Op a] -- loop with body consisting of given operations list
> | Set a -- OPTIMIZATION: set cell to value
> | FarAdd a a -- OPTIMIZATION: add k*value of current cell to cell 'a' away
> | Scan a -- OPTIMIZATION: search for 0 jumping x cells at a time
> deriving (Show, Eq)


Source code parsing is done using simple grammar written with Parsec.

The BNF form of grammar looks something like this:

program ::= instruction*
instruction ::= loop | simple
loop ::= '[' instruction* ']'
simple ::= '+' | '-' | '<' | '>' | '.' | ','

which can be written in Parsec:


> program :: Parser [Op Int]
> program = many instruction
>
> instruction :: Parser (Op Int)
> instruction = simple <|> loop
>
> loop :: Parser (Op Int)
> loop = between (char '[') (char ']') program >>= \p -> return $ Loop p
>
> simple :: Parser (Op Int)
> simple = (many1 (char '+') >>= \p -> return $ Add (length p))
> <|> (many1 (char '-') >>= \p -> return $ Add (negate $ length p))
> <|> (many1 (char '>') >>= \p -> return $ Move (length p))
> <|> (many1 (char '<') >>= \p -> return $ Move (negate $ length p))
> <|> ( (char '.') >>= \_ -> return $ Output)
> <|> ( (char ',') >>= \_ -> return $ Input)


Data memory is handled using a zipper:


> data ListZipper a = ListZipper {
> left :: ![a], -- elements left from focus
> focus :: ! a , -- current element
> right :: ![a] -- elements right from focus
> } deriving Show
>
> move :: Int -> ListZipper a -> ListZipper a
> move (-1) (ListZipper (x:xs) y zz) = ListZipper xs x (y:zz)
> move 1 (ListZipper xx y (z:zs)) = ListZipper (y:xx) z zs
> move 0 lz = lz
> move n lz = if n > 0 then move (n-1) (move 1 lz)
> else move (n+1) (move (-1) lz)
>
> mkZipper :: [a] -> ListZipper a
> mkZipper x = ListZipper [] (head x) (tail x)
>
> getValue :: ListZipper a -> a
> getValue (ListZipper _ y _) = y
>
> setValue :: ListZipper a -> a -> ListZipper a
> setValue (ListZipper xx _ yy) v = ListZipper xx v yy
>
> {-# INLINE move #-}
> {-# INLINE getValue #-}
> {-# INLINE setValue #-}

> scan :: Int -> ListZipper Word16 -> ListZipper Word16
> scan n lz@(ListZipper _ 0 _) = lz
> scan n lz = scan n (move n lz)
>
> addAt :: Int -> Int -> ListZipper Word16 -> ListZipper Word16
> addAt n k lz@(ListZipper l v r) =
> if v == 0 then lz
> else let doAddAt 0 _ _ = error "Invalid use of doAddAt - 0 offset"
> doAddAt 1 (x:xs) v = (v+x):xs
> doAddAt n (x:xs) v = x:(doAddAt (n-1) xs v)
> value = fromIntegral k*v in
> if n > 0 then ListZipper l v (doAddAt n r value)
> else ListZipper (doAddAt (-n) l value) v r
>
> {-# INLINE addAt #-}
> {-# INLINE scan #-}



Time for main interpreter.

Executing program transforms memory state (ListZipper Word16) into a new state with possible side-effects in I/O.


> runSequence :: ListZipper Word16 -> [Op Int] -> IO (ListZipper Word16)
> runSequence memory = foldM step memory
>
> step :: ListZipper Word16 -> Op Int -> IO (ListZipper Word16)
> step mem op = --trace (show op) $!
> case op of
> Move n -> return $! move n mem
> Add n -> return $! setValue mem ((getValue mem) + (fromIntegral n))
> Loop p -> doLoop p mem
> Set n -> return $! setValue mem (fromIntegral n)
> FarAdd n k -> return $! addAt n k mem
> Scan n -> return $! scan n mem
> Input -> CE.try getChar >>= \c ->
> case c of
> Left err -> return $! setValue mem 0
> Right x -> return $! setValue mem (fromIntegral $ ord x)
> Output -> hPutChar stdout (chr . fromEnum $ getValue mem) >> return mem


I have enclosed loop body execution into a separate function to make it look more clear (for me at least). Algorithm is simple - if on entry to the loop current memory cell has non-zero value, the loop body is executed as if it was a separate program, and the resulting memory state is passed again to doLoop to check if another iteration should be performed.
If value of current cell is zero program is skipped and function simply returns the same memory state that it got as parameter.


> doLoop :: [Op Int] -> ListZipper Word16 -> IO (ListZipper Word16)
> doLoop block memory = if (getValue memory) == 0 then return memory
> else runSequence memory block >>= doLoop block


Optimization step - precompiled program is transformed according to following rules:

  • sequence of Add operations is replaced with single Add with argument being sum of arguments of input, for example [Add 1, Add1] will be replaced with [Add 2], [Add 1, Add 1, Add (-1)] with [Add 1] etc. Technically sequences of '+' or '-' are handled by Brainfuck parser, but it does not optimize intermixed '+' and '-' correctly, for example in "+-+-+-". This is actually programer's mistake, but can happen anyway.

  • sequence of Move operations is replaced with single Move in similar fashion - [Move (-1), Move (-1)] is replaced with [Move (-2)] etc. Situation here is similar to described above - parser does not optimize "<><><><>" properly.

  • empty operations are removed, for example Add 0, Move 0 or empty loop.


Optimizer also replaces common loop patterns with single opcodes:

  • Loop [Add (-1)] (filling current cell with 0) is replaced with Set 0.

  • Loop [Add (-1), Move n, Add k, Move (-n)] (addition of k*current to cell n away) replaced with [FarAdd n k, Set 0] (Set 0 is necessary, since side-effect of the original loop is setting current cell to 0). Also variant with Add (-1) at the end of the loop is recognized.

  • Loop [Add (-1), Move n1, Add k1, Move n2, Add k2, Move n3] if n1+n2+n3 == 0 (two additions - k1*current to cell n1 away, k2*current to cell n1+n2) replaced with sequence of [FarAdd n1 k1, FarAdd (n1+n2) k2, Set 0]. Again, also variant with Add (-1) at the end is recognized.




> optimize :: [Op Int] -> [Op Int]
> optimize [] = []
> optimize ((Add 0) :xs) = optimize xs
> optimize ((Move 0) :xs) = optimize xs
> optimize ((Loop []) :xs) = optimize xs
> optimize ((Add x) :(Add y) :xs) = optimize (Add (x+y) :xs)
> optimize ((Move x):(Move y):xs) = optimize (Move (x+y):xs)
> optimize ((Set x) :(Add y) :xs) = optimize (Set (x+y) :xs)
> optimize ((Set x) :(Set y) :xs) = optimize (Set y :xs)
> optimize ((Loop [Add (-1)]):xs) = optimize (Set 0 :xs)
> optimize ((Loop [Move x]) :xs) = optimize (Scan x :xs)
> optimize ((Loop p) :xs) = let p' = optimize p in loopOptimize (Loop p') ++ (optimize xs)
> optimize (x :xs) = x:(optimize xs)
>
> loopOptimize x@(Loop [Add (-1), Move n1, Add k, Move n2]) =
> if n1 == -n2 then [FarAdd n1 k, Set 0]
> else [x]
> loopOptimize x@(Loop [Move n1, Add k, Move n2, Add (-1)]) =
> if n1 == -n2 then [FarAdd n1 k, Set 0]
> else [x]
> loopOptimize x@(Loop [Add (-1), Move n1, Add k1, Move n2, Add k2, Move n3]) =
> if (n1+n2) == -n3 then [FarAdd n1 k1, FarAdd (n1+n2) k2, Set 0]
> else [x]
> loopOptimize x@(Loop [Move n1, Add k1, Move n2, Add k2, Move n3, Add (-1)]) =
> if (n1+n2) == -n3 then [FarAdd n1 k1, FarAdd (n1+n2) k2, Set 0]
> else [x]
> loopOptimize x = [x]


doOptimize does optimization on its input as long as subsequent optimize calls result in any changes to the program.


> doOptimize p = let p' = optimize p in
> if p == p' then p else doOptimize p'


Program "normalization", i.e. removal of all characters other than valid Brainfuck instructions.


> normalize :: String -> String
> normalize program = filter (`elem` "+-<>[].,") program


And finally main function.


> main :: IO ()
> main = do
> args <- getArgs
> if length args < 1
> then fail "Please provide name of the program to run"
> else do
> prog <- readFile (head args)
> case (parse program "" . normalize) prog of
> Left err -> do putStr "Parse error at "
> print err
> Right res -> do
> runSequence (mkZipper (replicate 30000 0)) (doOptimize res)
> return ()