Hi everyone, I’m learning deep learning and following a Standford online course.
At this part, I don’t yet understand how to calculate the gradient with respect to the W.
I understand that q-k here denotes the function at a row of the result of the dot product of the two matrixes
I think the sum indicates that we want to take a derivative of df/dwij means that we take the function of row 1 and row 2 of the dot product between q = W.x and then take the derivative with respect to Wij means that find how much the change of Wij effect the two function so we sum it up right? since the Wij just appear once in the two functions so we treat other as constant that makes result just equal to 2qi*xj