Calculate the gradient with respect to a element of a matrix

Hi everyone, I’m learning deep learning and following a Standford online course.
At this part, I don’t yet understand how to calculate the gradient with respect to the W.
I understand that q-k here denotes the function at a row of the result of the dot product of the two matrixes

The gradient according to me is:

The answer in slide

I also don’t know how the gradient with respect to X is calculated

Thanks and I appreciate your support

Looks like they are using the calculus “chain rule”.

yeah, do you have any ideas about the different

and can you explain to me why they using summation for df/dwi,j

Because W is a 2D matrix, and i and j are its indices.

I think the sum indicates that we want to take a derivative of df/dwij means that we take the function of row 1 and row 2 of the dot product between q = W.x and then take the derivative with respect to Wij means that find how much the change of Wij effect the two function so we sum it up right? since the Wij just appear once in the two functions so we treat other as constant that makes result just equal to 2qi*xj

but I still do not have the answer for why the gradient matrix has been transposed

I suggest you ask the folks who created the video you’re watching. It’s not material that DLAI is responsible for.

1 Like

Just one follow-up. Often a transposition is needed in implementation, in order for the matrix dimensions to match up with the vector algebra needs.

I knew the reason, they made a mistake in the slide at that time and have corrected it