Why can’t we just differentiate the cost function to get the value of ‘w’ at the minimum cost where the derivative is equal to zero?

As we should be able to get ‘w’ for minimum cost from dJ(w)/dw = 0.

Hello @Ravi_Mathur,

We can.

However, we don’t, because one key concept we want to learn from this specialization is gradient descent. It is applicable not just to the case that you have mentioned, but also cases that can’t be efficiently solved with differentiations - those cases are deep neural networks.

A linear regression problem is just a good and simple problem for us to demonstrate the gradient descent without too much additional details.

Linear model is certainly an important model to learn about, but these courses also lay down a path towards neural network where gradient descent is the standard.

Cheers,

Raymond

Thank you. I understand.

Another thing to note is that this may not work for non-convex functions. As there is a chance of getting stuck at a local minimum.

You can. The result is called the Normal equation, but a solution only exists for the MSE cost function. So it can only be used for linear regression.