AlphaTensor: Breaking new grounds in Matrix Multiplication

Elemento · October 14, 2022, 1:25pm

Ever thought about discovering a new algorithm? I never thought about it myself, but it looks like now we don’t even need to! Thanks to DeepMind, now we can train AI models to do that for us!

Check out my latest blog to know more about DeepMind’s ingenious piece of research work, AlphaTensor, the first AI system for discovering new algorithms for the most fundamental computational tasks, and how AlphaTensor is able to come up with new and more efficient algorithms for matrix multiplication, undoubtedly a core task in AI and most probably in any scientific computation!

Cheers,
Elemento

gent.spah · October 14, 2022, 1:52pm

Hi @Elemento,

Nice publication. So if I understand right they input the matrices to the ML network and the network finds their multiplication after training it of course for some time. Or does the ML network come up with a new set of rules (defining the algorithm) that can be used to compute the matrix multiplications and this new algorithm when implemented is faster than previous algorithms?

Elemento · October 14, 2022, 2:09pm

Hey @gent.spah,
I am glad you liked the blog.

This is the case indeed. And the “ML network” here is actually an agent trained with the help of Deep Reinforcement Learning. Interestingly, the “new set of rules” that this agent comes up with are not uniform for all the matrix multiplications (in terms of sizes of matrices).

Like whenever we multiply matrices (manually), irrespective of their sizes, we always apply the same set of rules, the “traditional one”, in which we take the dot product of first row and first column, to get the first value in the product matrix and so on. But this agent comes up with different rules according to the sizes of the matrices involved. In some cases, the agent is able to come up with rules as good as the previous ones, while in some cases, the agent is able to come up with better rules. The paper specifically highlights the 4x4 case.

Also, this work expands the space for matrix multiplication algorithms far beyond humans were able to interpret till now. At least, this is what they have stated in the paper

Cheers,
Elemento

paulinpaloalto · October 14, 2022, 2:31pm

This sounds really interesting. But matrix multiplication is a precisely defined mathematical operation. The only way I can think of that they could optimize is it more to do with memory layouts and cache behavior and dealing with sparsity and the like (can certain operations simply be skipped because of lots of zeros in the inputs). In other words, it’s how you feed things to the vector engines and manage the parallelism and the like. Either that or the whole thing is a complex April Fool’s Joke. Note that Google is pretty well known for those. I’m interested to read the article and see what they are talking about.

gent.spah · October 14, 2022, 2:32pm

Thanks for clarifying friend, keep up the good work

gent.spah · October 14, 2022, 2:39pm

Hi Paul, I was also thinking its too early to celebrate, first of all as you say the rules are already defined in the mathematical axioms.

They seem to come up with different rules for different matrix sizes which is not conclusive. Who knows how much this new algorithm has been tested and validated…

It seems to me that they are just finding “new combinations of the existing parts” kind of philosophy, rearanging the problem and of course computers are much more powerful than humans in finding combinations but essentially the way the process is done is the same.

paulinpaloalto · October 14, 2022, 2:44pm

Well with small matrices like 4 x 4, it’s a good question whether it even makes sense to use the vector engines for the calculations. Because of the small sizes, before the pipelines are really full and cranking, you’re done, so perhaps you don’t get to recover the overhead of the setup and you could have done it quicker just with normal non-vectorized loops using multiply and add.

So I can believe that there could be some learning there about which path to take for a given computation …

Elemento · October 14, 2022, 3:41pm

Hi @paulinpaloalto Sir,
I wasn’t aware of this. But this got accepted into Nature, and made the front page of it’s weekly issue. I would like to think that this thing accounts for some credibility of this research work

Also, if I am not wrong, they haven’t done anything along these lines. The “sets of rules” that AlphaTensor came up with basically try to reduce the multiplication operations required at the expense of addition and subtraction operations, since multiplication operations are more costly to implement. Nonetheless, there still are some things that I am unable to comprehend from the paper.

Now, this I completely agree with Sir. But Sir, if say, that the method that AlphaTensor came up with, reduces the number of multiplication operations at the expense of addition/subtraction operations, then even if we only use the normal “loops”, won’t that speed up the computations? I am unsure as to how much more compute intensive a multiplication operations is as compared to a addition/subtraction operation. Would you like to say something about that Sir?

Cheers,
Elemento

paulinpaloalto · October 14, 2022, 8:03pm

Clearly I need to read the article before I can say anything more meaningful about it. It’s been a long time since CS students really had to worry about this level of detail in terms of the mechanics of implementing actual CPU instructions like Add and Multiply. If you think about multiplication, it is a more complicated operation to implement. For one thing, the way “overflow” works is a lot more complicated. With Addition, the worst that can happen is that you get a “carry” out of the high order bit. But with multiplication, the number of bits in the output can be quite a bit larger than either of the inputs. But when you are dealing with integer arithmetic, there are some simplifications you can do by “reducing” multiplication to successive additions. For example, if I know that I am multiplying some big number x by 3, we could express that as x + x + x and maybe that’s actually fewer CPU cycles. But that’s just one simple case and it seems a bit counterintuitive that you can generalize that idea in a way that it actually is applicable in real world scenarios.

Clearly I need to read the paper …

Elemento · October 16, 2022, 7:42am

Hey @paulinpaloalto Sir,

paulinpaloalto:

If you think about multiplication, it is a more complicated operation to implement. For one thing, the way “overflow” works is a lot more complicated. With Addition, the worst that can happen is that you get a “carry” out of the high order bit. But with multiplication, the number of bits in the output can be quite a bit larger than either of the inputs. But when you are dealing with integer arithmetic, there are some simplifications you can do by “reducing” multiplication to successive additions. For example, if I know that I am multiplying some big number x xx by 3, we could express that as x + x + x x+x+xx + x + x and maybe that’s actually fewer CPU cycles. But that’s just one simple case and it seems a bit counterintuitive that you can generalize that idea in a way that it actually is applicable in real world scenarios.

Thanks a lot for this amazing explanation. Based on your explanation, if this work manages to reduce the number of multiplication operations, this could really speed up the matrix multiplication operation. I will be waiting for you to read the paper and share your thoughts on the same. Till then, I will also try to read the paper some more times to comprehend it better. Have a great weekend, Sir

Cheers,
Elemento

paulinpaloalto · October 18, 2022, 3:20am

I read the top level version of the paper, but did not have the time to actually look at the various links that they provide with the code and other information. Here are a few thoughts:

If you just take the definition of matrix multiplication literally, the operation has a computational complexity of O(n^3), where the input matrices are assumed to be n x n. What I learned from reading the paper is that there has been an ongoing effort in the mathematical community to find equivalent algorithms with lower computational complexity. The first really successful instance of that was Strassen’s Algorithm which was first published in 1969 and gets the computational complexity down to a bit over O(n^{2.8}). If you want to get a sense for what such algorithms involve have a look at the Wikipedia link that I gave there. There have been other subsequent efforts that have further reduced the complexity. Here’s a Wikipedia page that gives a good overview of the whole area of what has been discovered up to this point specifically for optimizing matrix multiplication.

AlphaTensor furthers the state of the art by figuring out a way to pose the problem of finding equivalent but more efficient algorithms for matrix multiplication as a game that can be optimized using Deep Reinforcement Learning. This is an extension of the work that they did to play Chess and Go (AlphaGo) better than any human players. Using their Reinforcement Learning Engine, they are able to discover a new equivalent algorithm for matrix multiplication which has lower computational complexity than any of the previously known algorithms. Clearly this represents an important development: now that they have “proof of concept”, they will presumably apply the same technique to other algorithmic discovery tasks.

Elemento · October 18, 2022, 12:24pm

Hey @paulinpaloalto Sir,
Thanks a lot for your detailed explanation about the concept. And I guess it would be safe to assume now that this isn’t one of the popular pranks that Google is known for

Cheers,
Elemento

paulinpaloalto · October 18, 2022, 3:24pm

Yes, it is definitely “for real”, which was pretty clear once you told us that it had actually been published in Nature. Thanks for starting this whole topic and bringing all this to our attention! Now I really want to learn about Reinforcement Learning. I’ve heard of it of course, but never actually looked at it.

Elemento · October 18, 2022, 3:45pm

I am glad I could make a meaningful post. As for Reinforcement Learning, it indeed is fascinating Sir. My favourite book for RL is “Reinforcement Learning by Richard S. Sutton and Andrew G. Barto” in case you want to take a look at that Sir. And I am looking forward to asking my RL queries from you in the future Sir

Cheers,
Elemento

Topic		Replies	Views
Optimizing Matrix Multiplication: AlphaTensor for faster matrix multiplication, explained AI Discussions the-batch , ai-discussions	0	619	June 16, 2023
Matrix multiplication on coffee roast optional lab Advanced Learning Algorithms week-module-1	7	248	June 14, 2024
C5W4A1 Attention(Q,K,V) why we use matmul not dot product Sequence Models week-module-4 , coursera-platform	3	238	March 26, 2024
Collaborative filtering: R matrix calculation Unsupervised Learning, Recommenders, Reinforcement week-module-2	18	581	February 6, 2023
Neural Nets and Parallelism Neural Networks and Deep Learning week-module-4 , coursera-platform	21	447	October 19, 2024

AlphaTensor: Breaking new grounds in Matrix Multiplication

Related topics