Background

As the summer is soon here, I’m starting to have more time for personal amusements. I’ve been wanting to do this post for quite a while, and I finally found time for it.

Today’s topic being: can I make an effective Boggle solver in Haskell?.

As with the previous post, the full code is available on GitLab, with pertinent snippets posted here with commentary.

Let’s go!

The basic idea of Boggle is to find words in some N×N grid; the scoring depends on the game, but can depend both on the count of the words and the length of the said words. In this post, I hope to show you the precise methods how I implemented a program that finds the words for me (and much more quickly than I ever can).

As stated before, this program is made in Haskell, using Haskell Stack more precisely. I will skip the details on the workings of Haskell Stack; if you are interested, you may want to look at the Haskellversi/Othello in Haskell series.

Words!

How to store the words?

Before we can search for words, we of course need a list of those said words. For that, we shall use a trie structure; each node of a tree defines one character, and the word is built up by traversing a path through the tree. In the example below, a sample structure with several Finnish words is shown.

Word trie

I decided to implement a relatively simple data structure for it:

data Node = Node {
    nodeChar :: Maybe Char,
    nextCharacters :: [Node],
    isCompleteWord :: Bool
} deriving (Eq, Show)

Loading the words

As displayed in the sample picture (rendered in all its prettiness with GraphViz) with the word palkka and koi, a node can be both a last character for some word, and also a parent node for several other nodes. That’s why we require a separate Bool flag for denoting that. More flexibility.

Now to actually figure out a way to load the words. I decided to use a very simple file format for storing the words: any letter starts a new node under the current one, # returns to the parent node and ! marks the current node as complete. Very straightforward to load; also, the order of nodes is implicitly stored in the file, which allows for a degree of optimization. The GitHub repository includes a sample dictionary; generating a larger dictionary is left as an exercise to the reader.

While as some may very well disagree on using imperative techniques in Haskell, I decided to implement the loader using the ST monad. For those unaware, the ST monad basically allows mutable state within the scope of some function, without letting it do anything else untoward. It also makes working with a mutating string buffer a bit easier, as one doesn’t have to pass the mutating string back and forth the call stack. The end result nevertheless is a pure function, which doesn’t need special tricks to be used.

loadWordTree :: T.Text -> Node
loadWordTree str = runST $ do
    -- Create a new reference for the string
    data_str <- newSTRef str
    -- Create an empty node
    root_node <- newSTRef (Node Nothing [] False)
    -- Read the subnode in
    readSubnode data_str root_node
    -- And return the now-altered root node
    readSTRef root_node
  where
    readSubnode :: STRef s T.Text -> STRef s Node -> ST s ()
    readSubnode strRef nodeRef = do
      exitRef <- newSTRef False
      whileM_ (do
                str <- readSTRef strRef
                mustExit <- readSTRef exitRef
                return ((not.(T.null) $ str) && (not mustExit))
              ) (do
                -- Extract the head
                h <- ((readSTRef strRef) >>= (\txt -> return $ T.head txt))
                -- Strictly discard the head from the buffer
                modifySTRef' strRef (\str -> T.tail str)
                
                case h of
                  '#' -> modifySTRef' exitRef (const True) -- We want to exit now
                  '!' -> modifySTRef' nodeRef (\node -> node {isCompleteWord = True}) -- This node now denotes a complete word
                  _   -> do
                   -- Initialize a new node
                   newNode <- newSTRef (Node (Just (toUpper h)) [] False)
                   -- Read the subnode
                   readSubnode strRef newNode
                   -- Alter the current reference
                   readSTRef newNode >>= (\extractedNode -> modifySTRef' nodeRef (\node -> node {nextCharacters = extractedNode:(nextCharacters node)}))
              )
       -- Now that new changes are not expected, flip the list to its proper order         
      modifySTRef' nodeRef (\node -> node {nextCharacters = reverse (nextCharacters node)})
      return ()

To avoid costly list appending where possible, the subnodes are first added in reverse to the loading order, and then upon exiting reversed back into the correct order.

Finding solutions

To find solutions, we also need to define the exact definition of a solution. Let’s define it first.

type Solution = (SolutionMeta, Array BGCoord Flag) -- A definition of a solution

data Flag = None | D | U | L | R | UL | UR | DL | DR | Stop deriving (Eq,Show) -- For each point on a grid, in which direction we should step?

data SolutionMeta = Meta {
  foundWord :: T.Text, -- What was the word found?
  initialPoint :: BGCoord -- The starting point?
}

type BGCoord = (Int, Int) -- Coordinates used for a grid
type BoggleGrid = Array BGCoord Char -- The datatype of a grid

What we are going to do is effectively transforming a boggle grid into a set of solution grids. Each solution grid describes a path which one needs to draw from the start to the end to find a word. For convenience, the spelled word and the initial point is also returned along with each solution.

Let’s first define a few assisting functions - a way to check if we are in bounds, and a way to apply a direction symbol to the grid, while still retaining the position

-- Define a flag for each direction; a zero move means a stop and end of word
directionToFlag :: (Int, Int) -> Flag
directionToFlag (0,0) = Stop
directionToFlag (-1,0) = L
directionToFlag (1, 0) = R
directionToFlag (0,1) = D
directionToFlag (0,-1) = U
directionToFlag (-1,1) = DL
directionToFlag (1,1) = DR
directionToFlag (-1,-1) = UL
directionToFlag (1,-1) = UR

-- Checks if some coordinate is in valid bounds
inBounds :: Array BGCoord z -> BGCoord -> Bool
inBounds grid (x,y)
 | x < 0 || y < 0 = False
 | x > w || y > h = False
 | otherwise = True
 where
  (w,h) = (snd.bounds) grid

-- Apply a direction to the grid
applyDirection :: (BGCoord, Array BGCoord Flag) -> BGCoord -> (BGCoord, Array BGCoord Flag)
applyDirection (pos@(posX, posY), baseGrid) direction@(dirX,dirY) = ((posX+dirX, posY+dirY), baseGrid // [(pos, directionToFlag direction)]) 

And now to the main functionality. One of the conveniences of Haskell is that laziness can be remarkably easy to implement. Like in this instance, it is not necessarily required to evaluate ALL spots of the grid before results can be returned; one can return a list that evaluates the grid forward as it goes. The solutions are found using a list comprehension of many, many conditions to weed out the unsuitable combinations - and once a suitable combination is found, the direction is applied and recursively searched. Once we hit a stop, we do one more recursive call which results in the end symbol being set, and makes the function immediately return the resulting solution. This could also be very easy to parallelize, although this I have not tested yet.

-- Returns a lazy list of possible solutions; select only those subnodes where we also have letters in the dictionary.           
generatePathList :: Node -> BoggleGrid -> [Solution]
generatePathList rootNode grid = concat $ transpose $ map (\(pos,node) -> findWordsOnPos node grid pos) (validStarts)
  where validStarts = [ (p,n) | p <- (indices grid), n <- (nextCharacters rootNode), nodeChar n == Just (grid ! p)]

-- Finds the solutions from a single point. It is assumed the first letter is already root node 
findWordsOnPos :: Node -> BoggleGrid -> BGCoord -> [Solution]    
findWordsOnPos rootNode grid initialPos = iterateNode rootNode Nothing (T.singleton (fromJust $ nodeChar rootNode)) (initialPos, zeroArray)
 where
  zeroArray = fmap (const None) grid -- A default-by-zero array of proper size
 
  -- Iteration function; the current node being explored,  the last direction taken if available, the current text, and the path
  iterateNode :: Node -> Maybe BGCoord -> T.Text -> (BGCoord, Array BGCoord Flag) -> [Solution]
  iterateNode currentNode maybeLastDir currentText point@(pos@(posX,posY), currentPath)
      | maybeLastDir == Just (0,0) = [(Meta currentText initialPos, currentPath)] -- End of path, return a result
      | otherwise = concat $ pathSelector -- Otherwise return the contents of the path selector
      where
        pathSelector = [ iterateNode nextNode (Just (dX,dY)) (if ((dX /= 0) || (dY /= 0)) then currentText `T.snoc` nodeLetter else currentText) (applyDirection point (dX,dY))  | -- Apply a path for each combination where..
                        nextNode <- ((currentNode):nextCharacters currentNode),  -- we can select either our current node or any other node
                        dX <- [-1,0,1], dY <- [-1,0,1], -- there's a direction... 
                        (((dX /= 0) || (dY /= 0)) && nextNode /= currentNode) || (dX == 0 && dY == 0 && nextNode == currentNode && isCompleteWord currentNode), -- But either require a nonzero direction and a new node OR stay in place, choose current node and this word being a complete one  
                        (Just nodeLetter) <- [nodeChar nextNode], -- we've now established that we can possibly take the next node. Let's analyze it
                        (inBounds currentPath (posX+dX, posY+dY)), -- ensure that we can actually walk there
                        ((currentPath ! (posX+dX, posY+dY)) == None), -- our next step must be empty
                        gridLetter <- [(grid ! (posX+dX, posY+dY))], -- must have a valid letter on the grid..
                        nodeLetter == gridLetter -- and which must naturally match to the chosen node 
                        ]

And presto, we now have a way to figure out solutions. All we need to do now is to expose them to the user.

Binding it to an interface and displaying the results

As most of the rendering and user interface code is rather boilerplate, this will be presented in an abbreviated form.

There are multiple ways to display a grid - and for this specific case, I settled for a command-line display approach. First, we need to define a few more functions:

-- Define a character to show for each flag; first one is normal, 2nd one is a highlighted one
char :: Flag -> (Char, Char)
char None = (' ', ' ')
char D = ('↓', '🢃')
char U = ('↑', '🢁')
char L = ('←', '🢀')
char R = ('→', '🢂')
char UL = ('↖', '🢄')
char UR = ('↗', '🢅')
char DR = ('↘', '🢆')
char DL = ('↙', '🢇')
char Stop = ('◊', '◊')

-- Generates a rendering of a given grid, using a specialized function for that purpose
printGrid:: (Array BGCoord z) -> ((BGCoord, z) -> Char) -> IO ()
printGrid grid func = sequence_ $ map (\y -> do {row_func y;putStrLn " "}) (range (0,h))
 where
  -- Params
  (w,h) = snd $ bounds grid 
  --
  row_func :: Int -> IO ()
  row_func y = putStrLn $ concat $ map (\coord -> ' ':[func (coord,(grid ! coord))]) $ (map (\x -> (x,y)) (range (0,w)))

These functions have highly practical purposes for printing. The former one generates an UTF-8 arrow character for each direction (highlighted if it is, for example, an initial), and the latter emits the grid on the screen, using a mapping function to actually transform whatever datatype there is to a character.

The rest of the scaffolding code is available from GitHub. Now, onto to the..

Results!

Screengrab

After having gone through such an effort to create this program, I couldn’t resist the temptation to try it in practice. And it seems the program is indeed highly effective, as I managed to gain very consistently high scores on several time-limited games even with having to manually type and draw the words. Due to the proper data structure, it doesn’t take excessively long to find solutions - and when they are presented in a concise form, they are quick to read and enter.

In one of the games I tried, I found a curious pattern: I consistently had trouble keeping my top position. It seemed that some users were able to rack very high scores and keep up with me. It seems fairly unusual, considering that I can use the program to very rapidly discover the best-scoring words.

What I discovered a bit later, was that it is probably explained by the pay to win feature in that game, where you can literally buy more time. As I haven’t paid for such an advantage, my technocratic reign can easily crumble against a paying adversary. At least it made for a slightly more even playing field :D..

It is also very possible though that someone else has got the idea I got: a cursory Google search reveals links to many sites hawking what claim to be cheats to the tested games. Their safety is questionable, and therefore I passed on investigating them. For this program, the larger dictionaries are not included so if one wants to use this, they got to see some effort to it as well ;)

Thanks for reading - and again, do not forget to check out the full code on GitLab