Explanation for gradient

Tejas_Joshi · January 17, 2022, 6:25am

The calculation for gradient is : “g = d/dR Loss = (2/m)(X^T(XR-Y))”

There is no clear explanation given for why g = (2/m)(X^T(XR-Y)).
Specifically, why we got the transpose term multiplied to ‘XR-Y’.

Can someone provide a more elaborate explanation?

arvyzukai · January 17, 2022, 1:23pm

Hi Tejas_Joshi.

There is Optional reading towards the Week 4 end. It shows how you arrive to this formula.
The transposition is for vectorized version - to get the magnitude of the gradient you don’t have to loop through every parameter multiply and sum, you just dot product two vectors (your transposed inputs with the difference between predicted and actual outcomes).

Topic		Replies	Views
Calculate the gradient with respect to a element of a matrix AI Discussions	10	126	July 24, 2024
C5 W2: Word2Vec lecture: Softmax probability intuition Sequence Models	1	509	May 24, 2023
Explanation vectorization gradient descent Neural Networks and Deep Learning week-2	14	111	September 11, 2024
I need t explain this q too Calculus for Machine Learning and Data Science week-2	8	280	December 2, 2023
Lecture Notebook: Vanishing Gradients; Confusion NLP with Sequence Models week-3	11	500	January 6, 2023

Explanation for gradient

Related topics