The following is based on the famous article "Why Functional Programming Matters" (sometimes refered to as "whyfp90") by John Hughes.

I'd like to implement an alpha-beta heuristic search algorithm for the complex game of tic-tac-toe. This document is the set of my notes going through the process of implementing and playing with the heuristic in Haskell.

Before we get into the game, here are the imports we'll need.

import Data.Array.IArray
import Data.List (intercalate)
import Data.Tree
import Control.Monad (forM_)

Game State

The game state for tic-tac-toe is a 3x3 grid filled with X's, O's or blanks. Let's represent the grid cell contents (X, O, or blank) with a new data type.

data Cell = Blank | XX | OO
  deriving (Enum, Read, Eq, Ord)

instance Show Cell where
  show Blank = "."
  show XX    = "X"
  show OO    = "O"

-- | return the "opposite" cell contents
opp :: Cell -> Cell
opp Blank = Blank
opp XX    = OO
opp OO    = XX

data Player = PlayerX | PlayerO
  deriving (Show, Eq)

-- | Return the piece assigned to each player
piece :: Player -> Cell
piece PlayerX = XX
piece PlayerO = OO

A position will be an integer tuple (x, y) with entries in {0, 1, 2}. To represent the grid, the most natural data type is the array. We'll use the basic Data.Array interface, defining the game state as an array with indicies of type (Int, Int) and values of type Cell.

type Pos = (Int, Int)
type State = Array Pos Cell
boardSize = 9
boardInds = ((0,0), (2,2))

-- | Empty board
empty :: State
empty = listArray boardInds $ replicate boardSize Blank

-- | Return a list of board positions in order
positions :: [Pos]
positions = indices empty

-- | Return a list of cells in index order
cells :: State -> [Cell]
cells = elems

-- | Lookup the cell at a given position
lookupCell :: State -> Pos -> Cell
lookupCell = (!)

-- | Add a new move to the board (order of args is fold left folding)
update :: State -> (Pos, Cell) -> State
update st pc = st // [pc]

-- | Return the list of unoccupied positions on the board
freePos :: State -> [Pos]
freePos st = filter (\p -> st ! p == Blank) positions

The goal is to use the interface specified above everywhere below. This way it wil be easy to modify the internal representation of a board and not rewrite all the code below.

Here are some basic constructors for game states.

-- | Generate a State given a function from positions to cells
genState :: (Pos -> Cell) -> State
genState f = foldl (\s p -> update s (p, f p)) empty positions

-- | Assumes that X goes first, constructs a game state from a sequence of
-- positions
posSeq :: [Pos] -> State
posSeq ps = foldl update empty (zip ps (cycle [XX, OO]))

It's useful to have a pretty printer for the game state.

-- | Return a multi-line string representation of the state
pretty :: State -> String
pretty st = let chars = concat $ map show $ cells st
                rows = [ take 3 (drop (3*i) chars) | i <- [0..2] ]
            in intercalate "\n" rows

pprint :: State -> IO ()
pprint = putStrLn . pretty

-- | Return a one-line string representation of the state
prettyOneLine :: State -> String
prettyOneLine = concat .
                map show .
                cells

Here are a couple of example states to experiment with:

exState0 = posSeq [(0,0), (2,2), (0,1)]
exState1 = posSeq [(1,1)]
exState2 = posSeq [(0,0), (2,2), (0,1), (1,0)]
exStateWinX = posSeq [(1,1), (0,0), (1,2), (0,1), (1,0)]
exStateWinO = posSeq [(2,2), (1,1), (0,0), (1,2), (0,1), (1,0)]
ghci> putStrLn $ pretty exState0
XX.
...
..O

Game Trees

We want to build the game tree starting at a given state. To do that we need to know who's turn it is, X or O, from the board data.

-- | Examine the board and return whose turn it is, X or O.
getTurn :: State -> Cell
getTurn st
  | k == 0    = XX
  | k == 1    = OO
  | otherwise = error "encountered invalid board state"
  where k = (count XX st) - (count OO st)
        count c s = length . filter (==c) $ cells s

To build a game tree we need to know what moves are possible from a particular state. We define a moves function that takes a single game state and returns a list of possible game states which are reachable in one move from the given one.

-- | A collection of cell extration functions
getRow, getCol :: State -> Int -> [Cell]
getRow s i = [ lookupCell s (i,j) | j <- [0,1,2] ]
getCol s i = [ lookupCell s (j,i) | j <- [0,1,2] ]

getDiag, getAntiDiag :: State -> [Cell]
getDiag s = [ lookupCell s (i,j) | (i,j) <- [(0,0), (1,1), (2,2)] ]
getAntiDiag s = [ lookupCell s (i,j) | (i,j) <- [(2,0), (1,1), (0,2)] ]

-- | Check if the given player has won the game
win :: Player -> State -> Bool
win player state = checkRows || checkCols || checkDiag || checkAntiDiag
  where c = piece player
        checkWin c list = all (==c) list
        checkRows = any (==True) [checkWin c $ getRow state i | i <- [0,1,2]]
        checkCols = any (==True) [checkWin c $ getCol state i | i <- [0,1,2]]
        checkDiag = checkWin c $ getDiag state
        checkAntiDiag = checkWin c $ getDiag state

-- | Generate a list of new states reachable from the given state in one move
moves :: State -> [State]
moves st
  | win PlayerX st || win PlayerO st = []
  | otherwise = map (\p -> update st (p, getTurn st)) (freePos st)

To generate the full game tree from a given state we'll unfold the tree using the function \s -> (s, moves s). The input is a seed, the output is a pair consisting of a node value and a list of seeds with which to generate the next level.

-- | Generate a game tree from an initial game state
generate :: State -> Tree State
generate = unfoldTree (\s -> (s, moves s))

-- | A general purpose tree fold
foldTree :: (a -> b -> a) -> (a -> a -> a) -> a -> Tree b -> a
foldTree f g a (Node b []) = f a b
foldTree f g a (Node b (t:rest)) = g (foldTree f g a t)
                                     (foldTree f g a (Node b rest))

It's interesting to see what would happen if we had defined moves in a naive way, just using the expression from the second guard.

moves' st = map (\p -> update st (p, getTurn st)) (freePos st)
generate' = unfoldTree (\s -> (s, moves' s))

For example,

ghci> let t = prune 6 . generate $ exState1
ghci> length . flatten $ t
24785

ghci> let t' = prune 6 . generate' $ exState1
ghci> length . flatten $ t'
28961

That's about 17% more states (at a depth of 6) if you ignore wins and keep on playing.

Back to business... to generate a game tree to a given depth we'll define a prune function.

-- | A general purpose tree pruner
prune :: Int -> Tree a -> Tree a
prune 0 t = Node (rootLabel t) []
prune n t = Node (rootLabel t) (map (prune (n-1)) (subForest t))

For example here's the game tree starting at exState0 pruned to level 1:

ghci> let t = prune 1 $ generate exState0
ghci> putStrLn $ drawTree $ fmap prettyOneLine t
XX......O
|
+- XXO.....O
|
+- XX.O....O
|
+- XX..O...O
|
+- XX...O..O
|
+- XX....O.O
|
`- XX.....OO

Evaluation of Game States

Static evaluation: first we should be able to determine if a game state represents a win for one player.

-- | Take a game state and return a rough valuation of it. Larger results
-- are better for X, smaller (esp. negative) results are better for O.
staticVal :: State -> Int
staticVal state
  | win PlayerX state  = 1
  | win PlayerO state  = (-1)
  | otherwise          = 0

Now we can look through a game tree and find wins or losses for X:

ghci> let t = fmap staticVal . prune 4 . generate $ exState1
ghci> 1 `elem` flatten t
True

So X can win in 2 moves starting at exState1 where X played the center position (not surprising!) However, O cannot win in that number of moves (also not surprising!) but O can win if we look one move deeper.

ghci> (-1) `elem` flatten t
False
ghci> let t' = fmap staticVal . prune 5 . generate $ exState1
ghci> (-1) `elem` flatten t'
True

Minimax Valuation

The minimax idea is that in order to judge the value of a position, it is useful to assume each player makes the best move availible to them at each juncture. That's obviously not how every game plays out, but it's a good simplifying assumption.

We proceed by defining the valuation of a game state where it is X's turn as the maximum of the valuations of all the game states reachable in one move. Likewise, if it is O's turn, we define the valuation as the minimum of the valuations of all the game states reachable in one move. If the game state is a win or a tie, we define the valuation to be the static valuation above.

Note that this definition only makes sense if the game tree is finite. Even in such cases we may not be able to compute it if the game tree is too large, so in practice we'll prune the game tree to a reasonable depth before computing the valuation (and use the static valuation at leaves).

maximizeTreeVal :: (Ord a) => Tree a -> a
maximizeTreeVal (Node x []) = x
maximizeTreeVal (Node x subs) = maximum . map minimizeTreeVal $ subs

minimizeTreeVal :: (Ord a) => Tree a -> a
minimizeTreeVal (Node x []) = x
minimizeTreeVal (Node x subs) = minimum . map maximizeTreeVal $ subs

To make the functions more flexible, we can do the same as maximizeTreeVal, but return the list of numbers that maximizeTreeVal would take the maximum of.

maximizeTreeList :: (Ord a) => Tree a -> [a]
maximizeTreeList (Node x []) = [x]
maximizeTreeList (Node x subs) = map maximum . map minimizeTreeList $ subs

minimizeTreeList :: (Ord a) => Tree a -> [a]
minimizeTreeList (Node x []) = [x]
minimizeTreeList (Node x subs) = map minimum . map maximizeTreeList $ subs

With these definitions, the actual valuation function would be, e.g.:

ghci> let f = maximum . maximizeTreeList . fmap staticVal . prune 5 . generate
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package containers-0.5.0.0 ... linking ... done.
ghci> f exState1
0

If we increase the look-ahead to 6 plys, we see that this is a good position for X!

ghci> let f = maximum . maximizeTreeList . fmap staticVal . prune 6 . generate
ghci> f exState1
1

Alpha-Beta

The key to implementing alpha-beta is to replace map maximum and map minimum in the definitions above with a function that decides whether it needs to consider the next list of numbers in the list, based on early numbers in that list.

mapmin :: (Ord a) => [[a]] -> [a]
mapmin [] = []
mapmin (xs:rest) = n : (omit n rest)
  where n = minimum xs

omit :: (Ord a) => a -> [[a]] -> [a]
omit _ [] = []
omit n (xs:rest) | minleq n xs = omit n rest
                 | otherwise   = k : omit k rest
                     where k = minimum xs

The next function, minleq is key to the whole thing. It takes a value n and a list of values and returns True if the minimum of the list is less or equal to n. The function returns immediately without evaluating the rest of the list if it finds an element less or equal to n. In Haskell's lazy evaluation strategy, this means that if we never look at the numbers in one of the lists sent to minleq then the entire subtree of the game tree rooted at those unexamined numbers never gets evaluated, saving time and space!

minleq :: (Ord a) => a -> [a] -> Bool
minleq _ [] = False
minleq n (y:ys) | y <= n = True
                | otherwise = minleq n ys

For example:

ghci> minleq 1 [0,2,3,undefined]
True

Here are is the code for the corresponding mapmax:

mapmax [] = []
mapmax (xs:rest) = n : (omit' n rest)
  where n = maximum xs
        omit' _ [] = []
        omit' n (xs:rest) | maxleq n xs = omit' n rest
                          | otherwise   = k : omit' k rest
                              where k = maximum xs
        maxleq _ [] = False
        maxleq n (y:ys) | y >= n = True
                        | otherwise = maxleq n ys

Finally, we can define our alpha-beta heuristic game state valuation:

abMaxList :: (Ord a) => Tree a -> [a]
abMaxList (Node x []) = [x]
abMaxList (Node x subs) = mapmin . map abMinList $ subs

abMinList :: (Ord a) => Tree a -> [a]
abMinList (Node x []) = [x]
abMinList (Node x subs) = mapmax . map abMaxList $ subs

abmax :: State -> Int
abmax = maximum .
        abMaxList .
        fmap staticVal .
        prune 8 .
        generate

abmin :: State -> Int
abmin = minimum .
           abMinList .
           fmap staticVal .
           prune 8 .
           generate

Analysis

Here's a simple printer for sets of boards we can use to examine different moves and their valuation under abmax:

printMoves :: State -> IO ()
printMoves st = forM_ (zip [0..] (moves st)) $ \(i,s) -> do
                  putStrLn (show i)
                  pprint s
                  putStrLn ""

To start, let's see how the computer values the empty board:

ghci> fmap abmin $ moves empty
[0,0,0,0,0,0,0,0,0]

So all moves are equally good (our static valuation doesn't take into consideration that choosing the center is a good strategy).

So let's make a move as X and one as O:

ghci> let st = posSeq [(0,0), (2,0)]
ghci> pprint st
X..
...
O..

Now let's compute the valuation of each of X's possible moves from here:

ghci> fmap abmin $ moves st
[1,1,0,0,0,0,1]

Remember that positive values favor X and negative values favor O in our static valuation. Let's see what the game states are that correspond to the 1's in our valuation:

ghci> printMoves st
0
XX.
...
O..

1
X.X
...
O..

2
X..
X..
O..

3
X..
.X.
O..

4
X..
..X
O..

5
X..
...
OX.

6
X..
...
O.X

That certainly makes sense.. now let's choose move 1 and see what happens:

ghci> let st' = posSeq [(1,1), (1,0), (0,2), (2,0)]
ghci> pprint st'
..X
OX.
O..

ghci> map abmax $ moves st'
[1,1,0,1,1]

ghci> printMoves st'
0
X.X
OX.
O..

1
.XX
OX.
O..

2
..X
OXX
O..

3
..X
OX.
OX.

4
..X
OX.
O.X

It's interesting here that move 2 is not valued highly, despite the chance to win in one. Also, move 0 is the only one that prevents O from winning in one. I think the reason this doesn't factor into the valuation is that two-in-a-row isn't particularly valued by the static valuation, so when O is trying to minimize the valuation, it doesn't see the value of having an unblocked two-in-a-row.

That, or there's a bug in my program :)


Comments

comments powered by Disqus