Champion Model Is No Go: Adversarial AI Beats Master KataGo Algorithm

KATAGO_FastEdit_600px--1--1
A new algorithm defeated a championship-winning Go model using moves that even a middling human player could counter.

What’s new: Researchers at MIT, UC Berkeley, and the Fund for Alignment Research trained a model to defeat KataGo, an open source Go-playing system that has beaten top human players.

How it works: The authors’ system tricks KataGo into deciding prematurely that it has won, causing it to end a game when the authors’ model is in a winning position.

  • The authors trained a convolutional neural network to play Go using a modified version of a reinforcement learning method commonly used to train game-playing models. In the usual approach, the model plays itself and learns from all moves. In the authors’ version, the model played against a fixed KataGo model and learned only from its own moves, learning to exploit holes in KataGo’s strategy rather than becoming a conventionally savvy player.
  • The authors’ model forecasted its next moves using its own model, and it forecasted KataGo’s likely responses to those moves using KataGo’s model. It combined the forecasts to determine its next action. (KataGo can be configured to perform similar forecasting, but the authors didn’t use this capability while training their model.)
  • During training, once the model had won 50 percent of games, the authors increased the difficulty by pitting it against a version of KataGo that had been trained longer.

Results: The model’s winning strategy involved taking control of a corner of the board and adding a few easy-to-capture pieces outside that area.

  • This strategy enabled a version that predicted 600 moves ahead to win more than 99 percent of games against a KataGo that didn’t look ahead (which ranks among the top 100 European players).
  • A version that predicted the next 4,096 moves won 54 percent of games against a KataGo that looked 64 moves ahead (which ranks among the top 20 players worldwide).
  • The model lost to a naive human player who hadn’t played Go prior to undertaking the research project.
  • However, the naive player wasn’t able to defeat KataGo using the model’s strategy. This suggests that the strategy was less critical to the model’s victory than exploiting specific flaws in KataGo.

Why it matters: This work is a helpful reminder that neural networks are brittle, particularly to adversarial attacks that take advantage of a specific system’s idiosyncrasies. Even in the limited context of a game board, a model that achieves superhuman performance can be defeated by a simple — but unusual — strategy.

We’re thinking: AI practitioners perform exploratory data analysis and address potential attacks, but vulnerabilities always remain. Approaches like the one in this paper offer a way to find them.