Unbeatable Tic Tac Toe with Monte Carlo Tree Search Part 2: AI

In the last post we learned how to draw a Tic Tac Toe board using HTML’s Canvas and created a two player Tic Tac Toe game using our drawing class. In this post we’re going to add an intelligent computer opponent to our game.

1. Introduction

My inspiration to learn about Monte Carlo Tree Search (MCTS) came into existence when I found out about recent years huge advancements in game AI. AlphaGo was the first AI algorithm to defeat a Go world champion and at heart of it lies the MCTS algorithm and a neural network to enhance the MCTS performance and quality of AI moves but that’s another topic entirely.

So what is MCTS? Basically its a search algorithm that will find most promising next move, given current game state ( In our case game state is X’s and O’s drawn by players at any given time ).

Beauty and uniqueness of the algorithm comes from efficiency in games with large number of possible states as it use less computing resources than other similar algorithms and ability to limit the search depth by time or number of cycles, more on that later. The algorithm is game agnostic as long as its basic requirements are met and doesn’t need an evaluation function to evaluate quality of game states so its possible to use it in many games and even non-game applications.

Another algorithm for this kind of problem is Minimax which tries to find best actions by using a function to evaluate every possible game state but that’s simply very resource intensive for games like Chess and Go with millions of board states. Also writing an evaluation function to find how good a game state is, is rather difficult.

2. Monte Carlo Tree Search

Our game states can be represented as a tree structure where each node of tree is a specific game state:

Game states can be represented as tree structure where initial state is an empty board in picture above.

As you can see, each row of our tree represents all possible moves for given player. For example player 1 can choose between 8 possible moves, after that player 2 has 7 possible moves and so on until we rich terminal state which is the end game condition.

Now in order to have an intelligent bot we need to find a state which has best probability of winning. Naive way of doing this is exploring all nodes in our tree and check which one results in most wins for given player down the tree but of course this means spending lots of time exploring the tree and using lots of memory to store all the tree structure also as you probably guessed its nearly impossible to do this with bigger game boards.

Smarter way of doing this is to only explore subset of our state tree and choose most promising node based on accumulated statistics about our nodes. This is what MCTS algorithm is doing.

MCTS algorithm has 4 phases:

2.1 Selection

The algorithm starts with current game state as root node and selects a child nodes with highest win rate until we reach a child node which has unexplored children ( meaning all nodes children need to be added to the tree ) or is a terminal node. We also need to give nodes fair chance of getting explored.

A good formula to select such child nodes is Upper Confidence Bound applied to trees ( UCT ) :

  • w stands for number of simulated wins ( score ) for node
  • n stands for number of visits for the node
  • c stands for the exploration parameter. Usually its √2
  • t stands for number of node’s parent visits

The formula consists of two parts: Exploitation and Exploration. It makes sure that promising nodes gets explored more and no nodes is under-explored.

Tree selection phase, w stands for win score, v stands for number of node visits

2.2 Expansion

After selection, we expand our node with one randomly selected child. Unless the node is in terminal state.

In expansion phase we added one new random child to our selected node in previous phase.

2.3 Simulation

In simulation phase we play random moves without adding any new nodes to our tree until we reach terminal state. Choosing random or semi-random moves is called light play out. You can select moves by writing quality heuristics or evaluation function. This is called heavy play out.

2.4 Update / Backpropagation

In last phase of the algorithm nodes statistics is updated until we rich root node. We increment number of visits for nodes and increment or decrement win score depending on which player each node belongs to.

By doing these four steps repeatedly, MCTS accumulate enough statistics to know which child of root node has best chance of winning. We can run these four steps a set number of times or we can specify a duration like 1 second or until set number of expansions/simulations reached.

3. Implementation

3.1 TreeSearch Class

Lets take a look at the most important method of our TreeSearch class:

This function takes board state, current player and number of iterations as input and returns best action found.

From line 2 to 4 we initialize tree’s root node with current board state and assign player that is responsible for current board state to the root node state. Then four phases of our algorithm begins in a loop:

On line 9 we search for best fit child nodes using UCT formula explained above, it is defined in another function:

Notice that given node should be fully expanded ( all possible children are added to given node ) in order for UCT to work, otherwise we return the node itself in this step.

On lines 12 to 15 node is expanded with single child ( Phase 2 ) . Using the returned node from previous step we perform a simulation ( random play or using other methods as explained ) and get result of the simulation, here is the simulation step method:

In this step as an optimization we can limit the number of plays instead of reaching terminal state. It is useful technique for games with bigger boards but for Tic Tac Toe i prefer to go with reaching the terminal state.

As last step ( line 21 ) we need to update our tree with the simulation result:

We add to number of visits to each node in our update path and we add a reward ( can be any real number other than 1 ) to each node with player equal to the winning player in simulation phase otherwise we subtract the reward.

This concludes single iteration in our tree search algorithm. After some number of iterations or a time limit we select roots child node with most number of visits which is the best state found for given search.

3.2 Board Class

This class encapsulate the game board and have some methods to calculate possible actions on the board and checks for current state of the game: in-progress, win and draw. You can change this class and accompanying Action class to represent another game.

4. Conclution

There are other minor details in implementing MCTS but I think they are simple enough for anyone to figure out on their own. I provided full source code and you can find it here. With small changes to the code you can have new board games. If you have any questions or suggestions please feel free to comment below.