# How DeepMind’s AlphaTensor AI Devised a Faster Matrix Multiplication

After developing an artificial intelligence that can achieve superhuman proficiency in games like chess and Go, as well as another AI that can predict how proteins fold in three-dimensional space, DeepMind researchers have done it again—this time using a deep learning AI model to efficiently solve a fundamental math problem while breaking a 50-year-old record.

In a blog post earlier this month, the DeepMind team introduced AlphaTensor, an AI system designed to discover new and more efficient algorithms for solving crucial mathematical operations – in this case, matrix multiplication.

Whether they’re used to process or compress images or videos, recognize spoken commands, or run simulations to predict the weather, matrix multiplication underpins much of modern computers.

No wonder, then, that experts and companies around the world are constantly looking for more efficient ways to improve the algorithms for solving the mathematical operations behind such tasks.

Matrix multiplication is one of the simplest mathematical operations in algebra, in which individual numbers, arranged in grids – or matrices – are multiplied together and then added in some specific way to create a new matrix.

Such matrices are used to represent different types of data such as sets of pixels in an image or the internal workings of an artificial neural network.

For centuries, mathematicians used what they believed to be the most efficient method, until 1969, when the German mathematician Volker Strassen shook the mathematics world with an even better method that could multiply a pair of 2×2 matrices using seven multiplications instead of the usual eight.

Strassen’s record held for more than fifty years, but DeepMind’s AlphaTensor has shown that it can discover even more efficient methods on its own.

In fact, the team approached the problem of matrix multiplication like a game, with AlphaTensor building on the lessons learned from its game-playing predecessor, AlphaZero.

Both models use a type of machine learning called reinforcement learning in addition to Monte Carlo Tree Search (MCTS) techniques, allowing the system to gradually teach itself to improve as it receives feedback from previous “trains”. while playing the “game”. – be it chess or multiplying matrices.

In the case of AlphaTensor, the team reformulated the problem of finding efficient matrix multiplication algorithms as a single-player game, in which the “board” is translated as a three-dimensional array of numbers.

In order to achieve the goal of getting all numbers to zero in as few steps as possible, the model must fill in the number grid correctly and choose from a collection of allowed moves. This ultimately results in what the team says is “a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is measured by the number of steps taken to zero out the entries in the output matrix.”

Every time the system runs well, its internal parameters are updated to increase its future chances of success. At the same time, the Monte Carlo tree search technique helps him to predict how successful different paths to possible solutions might be, so that more advantageous paths are prioritized and the results of games are fed back into the network to improve the system further.

“We trained an AlphaTensor agent using reinforcement learning to play the game without having any knowledge of existing matrix multiplication algorithms,” the team explained.

“Through learning, AlphaTensor gradually improves over time, rediscovering historical fast matrix multiplication algorithms like Strassen’s, eventually surpassing the realm of human intuition and discovering algorithms faster than previously known.”

The team underscored the difficulty of the seemingly simple problem of multiplying two matrices together: “This game is incredibly challenging – the number of possible algorithms to consider is far greater than the number of atoms in the universe, even for small cases of matrix multiplication.

Compared to the game of Go, which remained a challenge for AI for decades, the number of possible moves at each step of our game is 30 orders of magnitude larger (over 10^^{33} for one of the settings we considered). Essentially, playing this game well is about identifying the smallest needle in a giant haystack of possibility.”

During their experiments testing input matrices up to 5×5, the team found that AlphaTensor not only “rediscovered” previously shown shortcuts in matrix multiplication, but also found new ways to perform these calculations efficiently.

For example, AlphaTensor was able to find an algorithm for multiplying a 4×5 matrix by a 5×5 matrix that required only 76 multiplications, beating a previous algorithm that required 80.

With a larger set of 11×12 and 12×12 matrices, AlphaTensor was able to reduce the number of multiplications required from 1,022 to 990. AlphaTensor can also optimize matrix multiplication for specific hardware, with the team training the system on two different processors so that performance was optimized for each processor.

Ultimately, the team believes the new work could have major implications for a wide range of fields, from mathematical research to computer science.

“From a mathematical point of view, our results can guide further research in complexity theory aimed at determining the fastest algorithms to solve computational problems. By exploring the space of possible algorithms in a more effective way than previous approaches, AlphaTensor* *contributes to expanding our understanding of the richness of matrix multiplication algorithms.

Understanding this space can unlock new results to determine the asymptotic complexity of matrix multiplication, one of the most fundamental open problems in computer science. Because matrix multiplication is a core component in many computational tasks involving computer graphics, digital communications, neural network training, and scientific computing, the algorithms discovered by AlphaTensor could make computations in these areas vastly more efficient.”