AlphaTensor: Breaking new grounds in Matrix Multiplication

Ever thought about discovering a new algorithm? I never thought about it myself, but it looks like now we don’t even need to! Thanks to DeepMind, now we can train AI models to do that for us!

Check out my latest blog to know more about DeepMind’s ingenious piece of research work, AlphaTensor, the first AI system for discovering new algorithms for the most fundamental computational tasks, and how AlphaTensor is able to come up with new and more efficient algorithms for matrix multiplication, undoubtedly a core task in AI and most probably in any scientific computation!

Cheers,
Elemento

4 Likes

Hi @Elemento,

Nice publication. So if I understand right they input the matrices to the ML network and the network finds their multiplication after training it of course for some time. Or does the ML network come up with a new set of rules (defining the algorithm) that can be used to compute the matrix multiplications and this new algorithm when implemented is faster than previous algorithms?

Hey @gent.spah,
I am glad you liked the blog.

This is the case indeed. And the “ML network” here is actually an agent trained with the help of Deep Reinforcement Learning. Interestingly, the “new set of rules” that this agent comes up with are not uniform for all the matrix multiplications (in terms of sizes of matrices).

Like whenever we multiply matrices (manually), irrespective of their sizes, we always apply the same set of rules, the “traditional one”, in which we take the dot product of first row and first column, to get the first value in the product matrix and so on. But this agent comes up with different rules according to the sizes of the matrices involved. In some cases, the agent is able to come up with rules as good as the previous ones, while in some cases, the agent is able to come up with better rules. The paper specifically highlights the 4x4 case.

Also, this work expands the space for matrix multiplication algorithms far beyond humans were able to interpret till now. At least, this is what they have stated in the paper :joy:

Cheers,
Elemento

1 Like

This sounds really interesting. But matrix multiplication is a precisely defined mathematical operation. The only way I can think of that they could optimize is it more to do with memory layouts and cache behavior and dealing with sparsity and the like (can certain operations simply be skipped because of lots of zeros in the inputs). In other words, it’s how you feed things to the vector engines and manage the parallelism and the like. Either that or the whole thing is a complex April Fool’s Joke. Note that Google is pretty well known for those. I’m interested to read the article and see what they are talking about.

1 Like

Thanks for clarifying friend, keep up the good work :slightly_smiling_face:

Hi Paul, I was also thinking its too early to celebrate, first of all as you say the rules are already defined in the mathematical axioms.

They seem to come up with different rules for different matrix sizes which is not conclusive. Who knows how much this new algorithm has been tested and validated…

It seems to me that they are just finding “new combinations of the existing parts” kind of philosophy, rearanging the problem and of course computers are much more powerful than humans in finding combinations but essentially the way the process is done is the same.

Well with small matrices like 4 x 4, it’s a good question whether it even makes sense to use the vector engines for the calculations. Because of the small sizes, before the pipelines are really full and cranking, you’re done, so perhaps you don’t get to recover the overhead of the setup and you could have done it quicker just with normal non-vectorized loops using multiply and add.

So I can believe that there could be some learning there about which path to take for a given computation …

1 Like

Hi @paulinpaloalto Sir,
I wasn’t aware of this. But this got accepted into Nature, and made the front page of it’s weekly issue. I would like to think that this thing accounts for some credibility of this research work :thinking:

Also, if I am not wrong, they haven’t done anything along these lines. The “sets of rules” that AlphaTensor came up with basically try to reduce the multiplication operations required at the expense of addition and subtraction operations, since multiplication operations are more costly to implement. Nonetheless, there still are some things that I am unable to comprehend from the paper.

Now, this I completely agree with Sir. But Sir, if say, that the method that AlphaTensor came up with, reduces the number of multiplication operations at the expense of addition/subtraction operations, then even if we only use the normal “loops”, won’t that speed up the computations? I am unsure as to how much more compute intensive a multiplication operations is as compared to a addition/subtraction operation. Would you like to say something about that Sir?

Cheers,
Elemento

Clearly I need to read the article before I can say anything more meaningful about it. It’s been a long time since CS students really had to worry about this level of detail in terms of the mechanics of implementing actual CPU instructions like Add and Multiply. If you think about multiplication, it is a more complicated operation to implement. For one thing, the way “overflow” works is a lot more complicated. With Addition, the worst that can happen is that you get a “carry” out of the high order bit. But with multiplication, the number of bits in the output can be quite a bit larger than either of the inputs. But when you are dealing with integer arithmetic, there are some simplifications you can do by “reducing” multiplication to successive additions. For example, if I know that I am multiplying some big number x by 3, we could express that as x + x + x and maybe that’s actually fewer CPU cycles. But that’s just one simple case and it seems a bit counterintuitive that you can generalize that idea in a way that it actually is applicable in real world scenarios.

Clearly I need to read the paper …

Hey @paulinpaloalto Sir,

Thanks a lot for this amazing explanation. Based on your explanation, if this work manages to reduce the number of multiplication operations, this could really speed up the matrix multiplication operation. I will be waiting for you to read the paper and share your thoughts on the same. Till then, I will also try to read the paper some more times to comprehend it better. Have a great weekend, Sir :nerd_face:

Cheers,
Elemento

I read the top level version of the paper, but did not have the time to actually look at the various links that they provide with the code and other information. Here are a few thoughts:

If you just take the definition of matrix multiplication literally, the operation has a computational complexity of O(n^3), where the input matrices are assumed to be n x n. What I learned from reading the paper is that there has been an ongoing effort in the mathematical community to find equivalent algorithms with lower computational complexity. The first really successful instance of that was Strassen’s Algorithm which was first published in 1969 and gets the computational complexity down to a bit over O(n^{2.8}). If you want to get a sense for what such algorithms involve have a look at the Wikipedia link that I gave there. There have been other subsequent efforts that have further reduced the complexity. Here’s a Wikipedia page that gives a good overview of the whole area of what has been discovered up to this point specifically for optimizing matrix multiplication.

AlphaTensor furthers the state of the art by figuring out a way to pose the problem of finding equivalent but more efficient algorithms for matrix multiplication as a game that can be optimized using Deep Reinforcement Learning. This is an extension of the work that they did to play Chess and Go (AlphaGo) better than any human players. Using their Reinforcement Learning Engine, they are able to discover a new equivalent algorithm for matrix multiplication which has lower computational complexity than any of the previously known algorithms. Clearly this represents an important development: now that they have “proof of concept”, they will presumably apply the same technique to other algorithmic discovery tasks.

1 Like

Hey @paulinpaloalto Sir,
Thanks a lot for your detailed explanation about the concept. And I guess it would be safe to assume now that this isn’t one of the popular pranks that Google is known for :joy:

Cheers,
Elemento

Yes, it is definitely “for real”, which was pretty clear once you told us that it had actually been published in Nature. Thanks for starting this whole topic and bringing all this to our attention! Now I really want to learn about Reinforcement Learning. I’ve heard of it of course, but never actually looked at it.

I am glad I could make a meaningful post. As for Reinforcement Learning, it indeed is fascinating Sir. My favourite book for RL is “Reinforcement Learning by Richard S. Sutton and Andrew G. Barto” in case you want to take a look at that Sir. And I am looking forward to asking my RL queries from you in the future Sir :nerd_face:

Cheers,
Elemento