DeepMind has applied its mastery of games to a more serious area: the fundamentals of computer science.
Google subsidiary today unveiled AlphaDev, an AI system that discovers new fundamental algorithms. According to DeepMind, the algorithms it discovered outperform those refined by human experts over decades.
The London-based lab has big ambitions for the project. As the demand for computing power grows and silicon chips reach their limits, fundamental algorithms become increasingly important will have to become exponentially more efficient. By improving these processes, DeepMind wants to change the infrastructure of the digital world.
The first objective of this mission is Sorting algorithms used to order data. Under the guise of our devices, they drive everything from search rankings to movie recommendations.
To improve their performance, AlphaDev studied assembly language instructions used to create binary code for computers. After an extensive search, the system discovered a sorting algorithm that outperformed previous benchmarks.
To find the winning combination, DeepMind had to revisit the feat that made it famous: winning board games.
Play the system
DeepMind has made a name for itself in games. In 2016, the company made headlines with its AI program defeated a world champion at Go, an incredibly complicated Chinese board game.
After the win, DeepMind built a more general system: AlphaZero. Using a so-called trial-and-error process reinforcement learning, The program not only mastered Go, but also chess and shogi (also known as “Japanese chess”).
AlphaDev – the new algorithm builder – is based on AlphaZero. However, the influence of gaming goes beyond the underlying model.
“We punish it for mistakes.
DeepMind formulated AlphaDev’s task as a single player game. In order to win the game, the system had to do it Create a new and improved sorting algorithm.
The system played its moves by choosing assembly instructions that were added to the algorithm. In order to find the optimal instructions, the system had to examine a large number of command combinations. According to DeepMind, the number was similar to the number of particles in the universe. And just one bad choice could invalidate the entire algorithm.
After each movement, AlphaDev compared the algorithm’s output with the expected results. If the output was correct and the performance efficient, the system received a “reward” – a signal that it was performing well.
“We punish it for mistakes and reward it for finding more and more of these sequences that are sorted correctly,” Daniel Mankowitz, the lead researcher, told TNW.
As you probably guessed, AlphaDev won the game. But the system not only found a correct and faster program. Novel solutions to the task were also discovered.
The sorting algorithm resulted in improvements that were up to 70% faster than benchmarks for shorter sequences and about 1.7% faster for sequences longer than 250,000 items. Image credit: Google DeepMind
The new algorithms included sequences of commands that stored a single instruction each time it was used. Dubbed “swap and copy moves,” they served as shortcuts to further algorithmic efficiencies.
DeepMind compares the approach to another moment in games: the legendary “Train 37”, what an AI System played against Go champion Lee Sedol.
The strange movement shocked human experts who thought the machine had made a mistake. But they soon discovered that the program had a plan.
“In the end, not only did it win the game, but it also influenced the strategies that professional Go players started using,” Mankowitz said.
The win marked the first time AI had defeated a top-ranked Go pro – a milestone that experts had predicted was still a decade away.
Three years later, Lee retired from professional Go competition. He attributed the decision to the capabilities of his AI rivals.
“Even if I become number one, there is one unit that cannot be defeated,” he said.
AlphaDev’s sorting algorithms are now available as an open source solution Main C++ library, where it is available to millions of developers and companies. According to DeepMind, it’s the first change to this part of the sorting library in over a decade — and the first algorithm developed through reinforcement learning to join the library.
After the sorting game, AlphaDev started playing with hashing, which is used to retrieve, store, and compress data. The result was another improved algorithm that is now available been released in the open source rappel library. DeepMind estimates that it is used trillions of times a day.
Finally, the laboratory introduces itself AlphaDev as a step towards transforming the entire computing ecosystem. And it all started with playing board games.