I understand what Prof. Andrew say about matrix multiplication of transpose of matrix A multiplied by matrix W. But what I don’t understand is that why we put the transpose matrix first, then followed by the other matrix, instead of the opposite which is matrix W multiply transpose of matrix A, assuming the number of rows and columns meet multiplication rules.
Hi @flyunicorn this is a great question! I am glad you are trying to understand the implications of where things are coming from.
Check this website: Matrix Multiplication
It shows a step-by-step of matrix calculation
ATW have this steps:
This is the final result
On the other hand, the other way around does not give the same answer:
It follows this steps
This is the final answer
So the other does matter, we are interested in getting ATW for our calculation, so we cannot perform differently as it will get different results, the dimensions are also different in ATW we have a 3x4 matrix, while WAT we have a 2x2 matrix
I hope this helps!
Hello, @flyunicorn,
The maths will work out either way, but to maintain the same orientation in the result, the slide’s approach saves one step.
Also, let’s bear in mind that this is a lecture for matrix multiplication which means that even we are using variables W and A, even this lecture belongs to the MLS, it does not imply that we have to build the maths in neural network this way. For example, if you go back to the lecture really for the maths in neural network, we simply do this: AW - no transpose whatsoever and it is computationally cheaper without transpose.
Cheers,
Raymond
you mean transpose A multiply W = WA? I know changing the order of matrix element will give different result. Not sure I understand the part after the graph . And MLS means machine learning statistics?
Which part of my response gave you this thought?
Multiplying the transpose of A and W is not equal to WA.
Machine Learning Specialization
Hello, @flyunicorn , in case this part was confusing, what I meant was that, in the following lecture for neural network maths, we see AW (btw, I edited my previous response because WA was wrong)
but in your screenshot, we see A^TW
The W in the neural network maths are neuron’s weights, and A are from previous layer, whereas the W and A in the matrix lecture are simply just any two matrices.
Cheers,
Raymond