Chapter 2. Go as a machine-learning problem

This chapter covers

  • Why are games a good subject for AI?
  • Why is Go a good problem for deep learning?
  • What are the rules of Go?
  • What aspects of game playing can you solve with machine learning?

2.1. Why games?

Games are a favorite subject for AI research, and it’s not just because they’re fun. They also simplify some of the complexities of real life, so you can focus on the algorithms you’re studying.

Imagine you see a comment on Twitter or Facebook: something like, “Ugh, I forgot my umbrella.” You’d quickly conclude that your friend got caught out in the rain. But that information isn’t included anywhere in the sentence. How did you reach that conclusion? First, you applied common knowledge about what umbrellas are for. Second, you applied social knowledge about the kinds of comments people bother to make: it’d be strange to say, “I forgot my umbrella” on a bright, sunny day.

As humans, we effortlessly factor in all this context when reading a sentence. This isn’t so easy for computers. Modern deep-learning techniques are effective at processing the information you supply them. But you’re limited in your ability to find all the relevant information and feed it to computers. Games sidestep that problem. They take place in an artificial universe, where all the information you need in order to make a decision is spelled out in the rules.

Games are especially well suited for reinforcement learning. Recall that reinforcement learning requires repeatedly running your program and evaluating how well it has accomplished a task. Imagine you’re using reinforcement learning to train a robot to move around a building. Before the control system is finely tuned, you risk the robot falling down a flight of stairs or knocking over your furniture. Another option is to build a computer simulation of the environment in which the robot will operate. This eliminates the risks of letting an untrained robot run around in the real world but creates new problems. First, you have to invest in developing a detailed computer simulation, which is a significant project in its own right. Second, there’s always a chance that your simulation isn’t completely accurate.

With games, on the other hand, all you need to do is have your AI play. If it loses a few hundred thousand matches while it’s learning, so what? In reinforcement learning, games are essential to serious research. Many cutting-edge algorithms were first demonstrated on Atari video games such as Breakout.

To be clear, you can successfully apply reinforcement learning to problems in the physical world. Many researchers and engineers have done so. But starting with games solves the problem of creating a realistic training environment and lets you focus on the mechanics and principles of reinforcement learning.

In this chapter, we introduce the rules of the game of Go. Next, we describe the structure of board-game AI at a high level, and identify points where you can introduce deep learning. Finally, we cover how you can evaluate the progress of your game AI throughout development.

2.2. A lightning introduction to the game of Go

You don’t need to be a strong Go player to read this book, but you do need to understand the rules well enough to enforce them in a computer program. Fortunately, the rules are famously simple. In short, two players alternate placing black and white stones on a board, starting with the black player. The goal is to control as much of the board as possible with your own stones.

Although the rules are simple, Go strategy has endless depth, and we don’t even attempt to cover it in this book. If you’re interested in learning more, we provide some resources at the end of this section.

2.2.1. Understanding the board

A Go board is a square grid, shown in figure 2.1. Stones go on the intersections, not inside the squares. The standard board is 19 × 19, but sometimes players use a smaller board for a quick game. The most popular smaller options are 9 × 9 and 13 × 13 boards. (The size refers to the number of intersections on the board, not the number of squares.)

Figure 2.1. A standard 19 × 19 Go board. The intersections marked with the dots are the star points, which are solely for players’ reference. Stones go on the intersections.

Notice that nine points are marked with a dot. These points are called the star points. Their main purpose is to help players judge distances on the board; they have no effect on game play.

2.2.2. Placing and capturing stones

One player plays with black stones, and the other plays with white stones. The two players alternate placing stones on the board, starting with the black player. Stones don’t move after they’re on the board, although they can be captured and removed entirely. To capture your opponent’s stones, you must completely surround them with your own. Here’s how that works.

Stones of the same color that are touching are considered connected together, as shown in figure 2.2. For the purposes of connection, we consider only straight up, down, left, or right; diagonals don’t count. Any empty point touching a connected group is called a liberty of that group. Every group needs at least one liberty to stay on the board. You can capture your opponent’s stones by filling their liberties.

Figure 2.2. The three black stones are connected. They have four liberties on the points marked with squares. White can capture the black stones by placing white stones on all the liberties.

When you place a stone in the last liberty of an opponent’s group, that group is captured and removed from the board. The newly empty points are then available for either player to play on (so long as the move is legal). On the flip side, you may not play a stone that would have zero liberties, unless you’re completing a capture.

An interesting consequence arises from the capturing rules. If a group of stones has two completely separate internal liberties, it can never be captured. See figure 2.3: black can’t play at A, because that black stone would have no liberties and its placement wouldn’t complete a capture because of the remaining liberty at B. Nor can black play at B, for the same reason. So black has no way to fill the last two liberties of the white group. These internal liberties are called eyes. In contrast, black can play at C to capture five white stones, because even though that black stone would have no liberties, it completes the capture. That white group has only one eye and is doomed to get captured at some point.

Figure 2.3. The white stones on the left can never be captured: black can play at neither A nor B. A black stone there would have no liberties, and is therefore an illegal play. On the other hand, black can play at C to capture five white stones.

Although it’s not explicitly part of the rules, the idea that a group with two eyes can’t be captured is the most basic part of Go strategy. In fact, this is the only strategy you’ll specifically code into your bot’s logic. All the more advanced Go strategies will be inferred through machine learning.

2.2.3. Ending the game and counting

Either player may pass any turn instead of placing a stone. When both players pass consecutive turns, the game is over. Before scoring, the players identify any dead stones: stones that have no chance of making two eyes or connecting up to friendly stones. Dead stones are treated exactly the same as captures when scoring the game. If a disagreement occurs, the players can resolve it by resuming play. But this is rare: if the status of any group is unclear, players usually try to resolve it before passing.

The goal of the game is to control a larger section of the board than your opponent. There are two ways to add up the score, but they nearly always give the same result.

The most common counting method is territory scoring. In this case, you get one point for every point on the board that’s completely surrounded by your own stones, plus one point for every opponent’s stone that you captured. The player with more points is the winner.

An alternative counting method is area scoring. With area scoring, you get one point for every point of territory, plus another point for every stone you have on the board. Except in rare cases, you’ll get the same winner by either method: if neither player passes early, the difference in captures will equal the difference in stones on the board.

Territory scoring is more common in casual play, but it turns out that area scoring is slightly more convenient for computers. Throughout the book, our AI assumes it’s playing under area scoring, unless otherwise noted.

In addition, the white player gets extra points as compensation for going second. This compensation is called komi. Komi is usually 6.5 points under territory scoring or 7.5 points under area scoring—the extra half point ensures there are no ties. We assume 7.5 points komi throughout the book.

Figure 2.4 shows the final position of an example 9 × 9 game.

Figure 2.4. The final positions of a 9 × 9 game. Dead stones are marked with an ×. Black territory is marked with a triangle. White territory is marked with a square.

Here’s how you score this game:

  1. The stones marked with an X are considered dead: they count as captures, even though the players didn’t make the capture in the game. Black also made a capture earlier in the game (not shown). So black has 3 captures, and white has 2 captures.
  2. Black has 12 points of territory: the 10 points marked with a triangle, plus the 2 points underneath the dead white stones.
  3. White has 17 points of territory: the 15 points marked with a square, plus the 2 points underneath the dead black stones.
  4. Black has 27 stones on the board, after removing the dead black stones.
  5. White has 25 stones on the board, after removing the dead white stones.
  6. By territory scoring, white has 17 points of territory + 2 captures + 6.5 komi for a total of 25.5 points. Black has 12 points of territory + 3 captures for a total of 15 points.
  7. By area scoring, white has 17 points of territory + 25 stones on the board + 7.5 komi for a total of 49.5 points. Black has 12 points of territory + 27 stones on the board for a total of 39 points.
  8. By either counting method, white wins by 10.5 points.

A game can end in one more way: either player can resign at any point. In a game between experienced players, it’s considered courteous to resign when you’re clearly behind. For our AI to be a good opponent, it should learn to detect when it should resign.

2.2.4. Understanding ko

One more restriction exists on where you can place stones. To ensure that the game ends, it’s illegal to play a move that will return the board to a previous state. Figure 2.5 shows an example of how this can happen.

Figure 2.5. An illustration of the ko rule. First, black captures one white stone. White would like to capture back by playing at A, but that would revert the board to the previous position. The ko rule forbids such a play, in order to prevent an endless cycle of capture and recapture. White must play somewhere else on the board first. After that, the overall board position is new, so white can come back to capture at A later—assuming black doesn’t protect it first.

In the diagram, black has just captured one stone. White might like to play at the point marked A to recapture the new black stone. But that would put the board back in the same position it was in two moves ago. Instead, white must play somewhere else on the board first. After that, if white captures at A, the global board position is different, so it’s legal. Of course, this gives black the opportunity to protect the vulnerable stone. In order to recapture the black stone, the white player must create a distraction big enough to draw black’s attention elsewhere on the board.

This situation is called a ko, from the Japanese word for eternity. Players adopt special strategies when a ko is on the board, and this was a weak spot for previous generations of Go-playing programs. In chapter 7, we show how to give your neural networks a hint to help them learn ko tactics. This is a general technique for training neural networks effectively. Even when you can’t articulate the rules that you want the neural network to learn, you can encode your inputs in a way that emphasizes the situations you want it to pay attention to.

2.3. Handicaps

When two players of unequal strength play, a simple system keeps the game interesting. The weaker player takes black and gets to place a few stones on the board before the game starts; these stones are called handicap stones. Then the stronger player takes white and makes the first move. In addition, in a handicap game, komi is usually reduced to just half a point. Normally, the purpose of komi is to negate the advantage a player gets from playing first; but the point of a handicap is to give black an extra advantage, so the two would work at cross purposes. The remaining half point of komi is just to break ties.

It’s traditional to place the handicap stones on the star points, but some players allow the black player to choose where to put them.

2.4. Where to learn more

Although we’ve covered the rules of the game, we haven’t even scratched the surface of what makes Go so engrossing and addictive. That’s beyond the scope of this book, but we encourage you to play a few games and learn more on your own. Here are a few resources for further exploration:

  • The best way to get into Go is to dive in and start playing, and it’s easier than ever to find a casual game online. The popular Online Go Server (http://online-go.com) enables you to play directly in your web browser. Even if you just learned the rules, its ranking system will help you find a competitive game. Other popular Go servers include the KGS Go Server (http://gokgs.com) and Tygem (www.tygembaduk.com).
  • Sensei’s Library (https://senseis.xmp.net) is a wiki-style reference, full of strategy, tips, history, and trivia.
  • Janice Kim’s Learn to Play Go series ranks among the best English-language Go books. We highly recommend volumes 1 and 2 for complete beginners: they’ll quickly get you to the point where you can make sense of a game.

2.5. What can we teach a machine?

Whether you’re programming a computer to play Go or tic-tac-toe, most board-game AIs share a similar overall structure. In this section, we provide a high-level overview of that structure, and identify specific problems the AI needs to solve. Depending on the game, the best solutions may involve game-specific logic, machine learning, or both.

2.5.1. Selecting moves in the opening

Early in the game, it’s difficult to evaluate a particular move because of the huge number of variations in the rest of the game. Both chess and Go AIs often use an opening book: a database of opening sequences taken from expert human games. To build this, you need a collection of game records from strong players. You analyze the game records, looking for common positions. In any common position, if a strong consensus exists about the next move—say, one or two moves account for 80% of the follow-ups—you add those moves to the opening book. The bot can then consult the book when playing a game. If any early game position appears in the opening book, the bot just looks up the expert move.

In chess and checkers, where pieces are removed from the board as the game progresses, AIs also contain similar endgame databases: when just a few pieces remain on the board, you can calculate all the variations in advance. This technique doesn’t apply to Go, where the board fills up toward the end.

2.5.2. Searching game states

The core idea behind board-game AI is tree search. Think about how humans play a strategy game. First, we consider a possible move for our next turn. Then we need to think of how our opponent is likely to respond, then plan how we’d respond to that, and so on. We read out the variation as far as we can, and then judge whether the result is good. Then we backtrack a bit and look at a different variation to see if it’s better.

This closely describes how the tree-search algorithms used in game AI work. Of course, humans can keep only a few variations in their heads at once, while computers can juggle millions with no trouble. Humans make up for their lack of raw computing power with intuition. Experienced chess and Go players are scary good at spotting the handful of moves worth considering.

Ultimately, raw computing power won out in chess. But a Go AI that could compete with top human players took an interesting twist: bringing human intuition to computers.

2.5.3. Reducing the number of moves to consider

In the context of game tree search, the number of possible moves on a given turn is the branching factor.

In chess, the average branching factor is about 30. At the start of the game, each player has 20 legal options for the first move; the number increases a bit as the board opens up. At that scale, it’s realistic to read out every single possible move to four or five moves ahead, and a chess engine will read out the more promising lines much deeper than that.

By comparison, the branching factor in Go is enormous. On the first move of the game, there are 361 legal moves, and the number decreases slowly. The average branching factor is around 250 valid moves per turn. Looking ahead just four moves requires evaluating nearly 4 billion positions. It’s crucial to narrow down the number of possibilities. Table 2.1 shows how the branching factor affects the number of possible game positions by comparing chess to Go.

Table 2.1. Approximate number of possible board states in games
 

Branching factor 30 (chess)

Branching factor 250 (Go)

After two moves 900 62,500
After three moves 27,000 15 million
After four moves 810,000 4 billion
After five moves 24 million 1 trillion

In Go, rules-based approaches to move selection turn out to be mediocre at this task: it’s extremely difficult to write out rules that reliably identify the most important area of the board. But deep learning is perfectly suited to the problem. You can apply supervised learning to train a computer to imitate a human Go player.

You start with a large collection of game records between strong human players; online gaming servers are a great resource here. Then you replay all the games on a computer, extracting each board position and the following move. That’s your training set. With a suitably deep neural network, it’s possible to predict the human move with better than 50% accuracy. You can build a bot that just plays the predicted human move, and it’s already a credible opponent. But the real power comes when you combine these move predictions with tree search: the predicted moves give you a ranked list of branches to explore.

2.5.4. Evaluating game states

The branching factor limits how far an AI can look ahead. If you could read out a hypothetical sequence all the way to the end of the game, you’d know who’d win; it’s easy to decide whether that’s a good sequence. But that’s not practical in any game more complex than tic-tac-toe: the number of possible variations is just too large. At some point, you have to stop and pick one of the incomplete sequences you’ve looked at. To do so, you take the final board position you’ve read out and assign it a numerical score. Of all the variations you analyzed, you select the move that leads to the best-scoring position. Figuring out how to compute that score is the tricky part: that’s the problem of position evaluation.

In chess AIs, position evaluation is based on logic that makes sense to chess players. You can start with simple rules like this: if you capture my pawn, and I capture your rook, that’s good for me. Top chess engines go far beyond that, factoring sophisticated rules about where on the board the pieces ended up and what other pieces are blocking their movement.

In Go, position evaluation may be even more difficult than move selection. The goal of the game is to cover more territory, but it’s surprisingly difficult to count territory: the boundaries tend to remain vague until late in the game. Counting captures doesn’t help much either; sometimes you can make it all the way to the end of a game with only a couple of captures. This is another area where human intuition reigns supreme.

Deep learning was a major breakthrough here as well. The kinds of neural networks that are suitable for move selection can also be trained to evaluate board positions. Instead of training a neural network to predict what the next move will be, you train it to predict who will win. You can design the network so it expresses its prediction as a probability; that gives you a numeric score to evaluate the board position.

2.6. How to measure your Go AI’s strength

As you work on your Go AI, you’ll naturally want to know how strong it is. Most Go players are familiar with the traditional Japanese ranking system, so you want to measure your bot’s strength on that scale. The only way to calibrate its level is by playing against other opponents; you can use other AIs or human players for that purpose.

2.6.1. Traditional Go ranks

Go players generally use the traditional Japanese system, where players are given kyu (beginner) or dan (expert) ranks. The dan levels, in turn, are divided into amateur dan and professional dan. The strongest kyu rating is 1 kyu, and larger numbers are weaker. Dan ranks go in the opposite direction: 1 dan is one level stronger than 1 kyu, and larger dan ranks are stronger. For amateur players, the scale traditionally tops out at 7 dan. Amateur players can get a rating from their regional Go association, and online servers will also track a rating for their players. Table 2.2 shows how the ranks stack up.

Table 2.2. Traditional Go ranks
25 kyu Complete beginners who have just learned the rules
20 kyu to 11 kyu Beginners
10 kyu to 1 kyu Intermediate players
1 dan and up Strong amateur players
7 dan Top amateur players, close to professional strength
Professional 1 dan to professional 9 dan World’s strongest players

Amateur ranks are based on the number of handicap stones required to make up the difference in strength between two players. For example, if Alice is 2 kyu and Bob is 5 kyu, Alice will usually give Bob three handicap stones so that they have an equal chance of winning.

Professional ranks work a little differently: they’re more like titles. A regional Go association awards top players a professional rank based on results in major tournaments, and that rank is held for life. The amateur and professional scales aren’t directly comparable, but you can safely assume that any player with a professional rank is at least as strong as an amateur 7 dan player. The top pros are significantly stronger than that.

2.6.2. Benchmarking your Go AI

An easy way to estimate the strength of your own bots is to play against other bots of known strength. Open source Go engines such as GNU Go and Pachi provide good benchmarks. GNU Go plays at around a 5 kyu level, and Pachi is about 1 dan (Pachi’s level varies a bit depending on the amount of computing power you provide it). So if you have your bot play GNU Go 100 times, and it wins about 50 games, you can conclude that your bot is also somewhere in the neighborhood of 5 kyu.

To get a more precise rank, you can set up your AI to play on a public Go server with a rating system. A few dozen games should be enough to get a reasonable estimate.

2.7. Summary

  • Games are a popular subject for AI research because they create controlled environments with known rules.
  • The strongest Go AIs today rely on machine learning rather than game-specific knowledge. In part because of the huge number of possible variations to consider, rule-based Go AIs were historically not strong.
  • Two places you can apply deep learning in Go are move selection and position evaluation.
  • Move selection is the problem of narrowing the set of moves you need to consider in a particular board position. Without good move selection, your Go AI will have too many branches to read.
  • Position evaluation is the problem of estimating which player is ahead and by how much. Without good position evaluation, your Go AI will have no ability to select a good variation.
  • You can measure the strength of your AI by playing against widely available bots of known strength, such as GNU Go or Pachi.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset