How to choose between matrix multiplication and element wise multiplication during BackPropagation in Chain Rule?

Hi there,
I have a doubt about which multiplication method to use during backpropagation.
Sometimes we use matrix multiplication and sometimes we use element wise multiplication. How to decide this?
Thanks

You have to look at the details of the equation you’re implementing.

If the equation involves the sum of the products of the elements of two vectors, then it’s a matrix (dot) product.

Here’s a thread which explains the notational conventions that Prof Ng uses to distinguish the two types of multiplication.

If the question is the mathematics behind the back propagation formulas, that is beyond the scope of this course. But here is a thread which links to derivations and other resources about that if you have the math background and are curious.

In eqn(i) we are using dot product between the two partial derivatives.
In eqn(ii) we are using dot product between the first two partial derivatives and element wise product with the third partial derivative.
My doubt is how can we choose between this.
Is it just the shape of the matrices that we have to check while choosing this or is there any particular reason.
Thanks in advance

It starts with the equation for L. The products with Y and (1-Y) are element-wise, because those values are either 0 or 1, and are used as a mask on the log() terms.

Yes, it’s a good clue if the matrices are the same size, you need the element-wise product.

Well, at a direct level, you can tell by looking at the dimensions. The rules for dot product require the inner dimensions to match and for elementwise the two operands must be broadcastable to the same shape. Do the dimensional analysis on the three operands there and you’ll see that’s the only way it could work. That point was covered on the first thread that I linked for you above. But the real mathematical reason is that the third operand there is the derivative of the activation function and the activation functions are applied elementwise.

Of course, this is all mathematics. Prof Ng considers much of that to be beyond the scope of these courses, but I pointed you to the relevant info in the second link that I gave you above.

Here’s a another link to the website of one of our mentors who did a very thorough job of covering these topics.

3 Likes

Thanks @paulinpaloalto and @TMosh.