I am trying to wrap my head around dw calculation. Matrix multiplication is row x column so the first element of dw should be 1 feature (all examples) [row] * dz (which is element wise multiplication)[column]. Why is professor saying X(1)dz(1). Doesn’t X(1) means first training example?
Hello @Nishant_Mahajan,
Both you and the lecture are correct.
When we multiply X with dz, if you agreed that dz1 will multiply with the first feature of the first sample, then when you shift your focus to the second row of X, I believe you would also agree that dz1 will also multiply with the second feature of the first sample. If we repeat this over all of the rows, then every feature of the first sample will multiply with dz1, and therefore dz1 will multiply with x1.
Another way to see what I have said is by making up a small w and a small dz on a piece of paper, do the matrix multiplication like you said, and finally highlight all of the terms about the first sample, and you will see that they multiply with and only with dz1.
Cheers,
Raymond