Professional Documents
Culture Documents
AlphaGo Tutorial Slides
AlphaGo Tutorial Slides
v (s)
Position
Policy network
Move probabilities
p (a|s)
Position
Neural network training pipeline
Human expert Supervised Learning Reinforcement Learning Self-play data Value network
positions policy network policy network
Supervised learning of policy networks
Policy network: 12 layer convolutional neural network
Training data: 30M positions from human expert games (KGS 5+ dan)
Results: 57% accuracy on held out test data (state-of-the art was 44%)
Reinforcement learning of policy networks
Policy network: 12 layer convolutional neural network
9d
7d
5d
3d
1d
9p
7p
5p
3p
1p
1k
3k
5k
7k
Gnu
Go
Fuego
Pachi
Zen
Crazy Stone
AlphaGo (Nature v13)
AlphaGo (Seoul v18)
0
4500
4000
3500
3000
2500
2000
1500
1000
500
Computer Programs Calibration Human Players
Beats Beats
Beats Beats
KGS Amateur
Crazy Stone and Zen
humans
Whats Next?
Demis Hassabis