Calculate the gradient with respect to a element of a matrix

vietpham102301 · September 28, 2023, 4:10pm

Hi everyone, I’m learning deep learning and following a Standford online course.
At this part, I don’t yet understand how to calculate the gradient with respect to the W.
I understand that q-k here denotes the function at a row of the result of the dot product of the two matrixes

The gradient according to me is:

The answer in slide

I also don’t know how the gradient with respect to X is calculated

Thanks and I appreciate your support

TMosh · September 28, 2023, 6:47pm

Looks like they are using the calculus “chain rule”.

vietpham102301 · September 29, 2023, 1:22am

yeah, do you have any ideas about the different

vietpham102301 · September 29, 2023, 1:24am

and can you explain to me why they using summation for df/dwi,j

TMosh · September 29, 2023, 1:55am

Because W is a 2D matrix, and i and j are its indices.

vietpham102301 · September 29, 2023, 5:30am

I think the sum indicates that we want to take a derivative of df/dwij means that we take the function of row 1 and row 2 of the dot product between q = W.x and then take the derivative with respect to Wij means that find how much the change of Wij effect the two function so we sum it up right? since the Wij just appear once in the two functions so we treat other as constant that makes result just equal to 2qi*xj

vietpham102301 · September 29, 2023, 5:32am

but I still do not have the answer for why the gradient matrix has been transposed

TMosh · September 29, 2023, 6:02am

I suggest you ask the folks who created the video you’re watching. It’s not material that DLAI is responsible for.

TMosh · September 29, 2023, 8:28pm

Just one follow-up. Often a transposition is needed in implementation, in order for the matrix dimensions to match up with the vector algebra needs.

vietpham102301 · September 30, 2023, 8:19am

I knew the reason, they made a mistake in the slide at that time and have corrected it

rashid_cn · July 24, 2024, 3:22pm

Topic		Replies	Views
Week 3, "Gradient Descent for Neural Networks" Neural Networks and Deep Learning week-module-3 , coursera-platform	10	481	March 25, 2024
Derivation of the gradients of W^[2], and b^[2] in the 1 hidden neuron network Neural Networks and Deep Learning week-module-3 , coursera-platform	5	28	March 18, 2025
How to calculate dw(dL/dw)? Neural Networks and Deep Learning coursera-platform	2	945	January 22, 2022
Question about gradient descent for neural network Neural Networks and Deep Learning coursera-platform	5	552	December 12, 2022
Formal explanation of change of order in chain rule Neural Networks and Deep Learning coursera-platform	4	757	April 24, 2023

Calculate the gradient with respect to a element of a matrix

Related topics