Module 2 - Vectorization Part 2

Video time 2:36, explanation on gradient descent with vectorization starts. 16 parameters (w1, w2, …, w16) are understood. But I fail to understand how can there be 16 derivative terms for these 16 weights? As per my understanding, to calculate the derivative we have the formula for derivative as 1/m summation i=0 to m-1 of (f_w_b(x(i)) - y(i)) * x(i). So the 16 weights all get used in calculating one single f_w_b(x(i)). I am confused how prof. Ng is showing vector of 16 derivative values. I am really missing some key point here. Help please!

Oh, I think I know why but I am not too sure unless someone can confirm.
When calculating derivative, (f_w_b(x(i)) - y(i)) results in a single scalar value. Further, this scalar value is multiplied with a x(i) vector of size 16. Maybe that is resulting in the sixteen partial derivative values.

Yes thats right you need to take partial derivatives with respect to each wight Wi.

1 Like