AlphaGo is the first computer program to defeat a professional human Go player, the first to defeat a Go world champion, and is arguably the strongest Go player in history.
Go is known as the most challenging classical game for artificial intelligence because of its complexity.
Despite decades of work, the strongest Go computer programs could only play at the level of human amateurs. Standard AI methods, which test all possible moves and positions using a search tree, can’t handle the sheer number of possible Go moves or evaluate the strength of each possible board position.
What is Go?
Go originated in China over 3,000 years ago. Winning this board game requires multiple layers of strategic thinking.
Two players, using either white or black stones, take turns placing their stones on a board. The goal is to surround and capture their opponent's stones or strategically create spaces of territory. Once all possible moves have been played, both the stones on the board and the empty points are tallied. The highest number wins.
As simple as the rules may seem, Go is profoundly complex. There are an astonishing 10 to the power of 170 possible board configurations - more than the number of atoms in the known universe. This makes the game of Go a googol times more complex than chess.
To capture the intuitive aspect of the game, we needed a new approach.
We created AlphaGo, a computer program that combines advanced search tree with deep neural networks. These neural networks take a description of the Go board as an input and process it through a number of different network layers containing millions of neuron-like connections.
One neural network, the “policy network”, selects the next move to play. The other neural network, the “value network”, predicts the winner of the game. We introduced AlphaGo to numerous amateur games to help it develop an understanding of reasonable human play. Then we had it play against different versions of itself thousands of times, each time learning from its mistakes.
Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-making. This process is known as reinforcement learning. AlphaGo went on to defeat Go world champions in different global arenas and arguably became the greatest Go player of all time.
AlphaGo: The Movie
I thought AlphaGo was based on probability calculation and that it was merely a machine. But when I saw this move, I changed my mind. Surely, AlphaGo is creative.
In October 2015, AlphaGo played its first match against the reigning three-time European Champion, Mr Fan Hui. AlphaGo won the first ever game against a Go professional with a score of 5-0.
AlphaGo then competed against legendary Go player Mr Lee Sedol, the winner of 18 world titles, who is widely considered the greatest player of the past decade. AlphaGo's 4-1 victory in Seoul, South Korea, on March 2016 was watched by over 200 million people worldwide. This landmark achievement was a decade ahead of its time.
Inventing winning moves
The game earned AlphaGo a 9 dan professional ranking, the highest certification. This was the first time a computer Go player had ever received the accolade. During the games, AlphaGo played several inventive winning moves, several of which - including move 37 in game two - were so surprising that they upended hundreds of years of wisdom. Players of all levels have extensively examined these moves ever since.
Playing the online Master
In January 2017, we revealed an improved, online version of AlphaGo called Master. This online player achieved 60 straight wins in time-control games against top international players.
The Chinese summit
Four months later, AlphaGo took part in the Future of Go Summit in China, the birthplace of Go. The five-day festival created an opportunity to explore the mysteries of Go in a spirit of mutual collaboration with the country’s top players. Designed to help unearth even more strategic moves, the summit included various game formats such as pair Go, team Go, and a match with the world’s number one player Ke Jie.
Demis Hassabis on AlphaGo's legacy and the 'Future of Go Summit'
AlphaGo Zero: Starting from scratch
Following the summit, we revealed AlphaGo Zero. While AlphaGo learnt the game by playing thousands of matches with amateur and professional players, AlphaGo Zero learnt by playing against itself, starting from completely random play.
This powerful technique is no longer constrained by the limits of human knowledge. Instead, the computer program accumulated thousands of years of human knowledge during a period of just a few days and learned to play Go from the strongest player in the world, AlphaGo.
AlphaGo Zero quickly surpassed the performance of all previous versions and also discovered new knowledge, developing unconventional strategies and creative new moves, including those which beat the World Go Champions Lee Sedol and Ke Jie. These creative moments give us confidence that AI can be used as a positive multiplier for human ingenuity.
AlphaZero: Creative player
In late 2017, we introduced AlphaZero, a single system that taught itself from scratch how to master the games of chess, shogi, and Go, beating a world-champion computer program in each case.
AlphaZero replaces hand-crafted heuristics with a deep neural network and algorithms that are given nothing beyond the basic rules of the game. By teaching itself, AlphaZero developed its own unique and creative style of play in all three games.
In its chess games, for example, players saw it had developed a highly dynamic and “unconventional” style of play that differed from any previous chess playing engine. Many of its “game changing” ideas have since been taken up at the highest level of play.
I can’t disguise my satisfaction that it plays with a very dynamic style, much like my own!"
AlphaZero: Player and potential
MuZero: Learning a model of the world
The latest version of our algorithm, known as MuZero, takes these ideas one step further. It matches the performance of AlphaZero on Go, chess and shogi, while also mastering a range of visually complex Atari games, all without being told the rules of any game.
It does this by learning a model of its environment and combining it with AlphaZero’s powerful lookahead tree search. This allows it to plan winning strategies in unknown domains, a significant leap forward in the capabilities of reinforcement learning algorithms and an important step towards our mission of building general-purpose learning systems.
While it is still early days, the ideas behind MuZero's powerful learning and planning algorithms may pave the way towards tackling new problems in messy real-world environments where the “rules of the game” are unknown.