In gradient descent, we know from the contour plot that the value of the cost function J(w,b) is the same for one contour, but as we can see that we can get this value by many different values of w and b, so it seems that we don’t have a unique optimization for the model. am I right?
Also, it seems to be sensitive to the initial choice of the parameters w and b (like initial conditions), in that case we can reach the global minima by just unique path and we can get a unique optimization in that case. right?
It’s not gradient descent that decides whether you have only one optimal w and b, it’s the model and the loss function that decide about it.
For linear regression & squared loss, there is only one global minimum, so there is only one combination of w and b for the optimal solution, and you can get to it from any initial w and b.
However, generally speaking, for a neural network, it can have many local minima, so an optimal solution can be one of those local minima depending on your intiial choice of parameters, and therefore, you can have more than one set of optimized parameters.
If the cost function is convex, then there is one unique optimized solution. Its right in the middle of the contour plot.
There are many possible convergence paths, but they all lead to the center.
The cost function for linear regression and logistic regress is always convex.
Thanks TMosh, That’s exactly my point, so the target is unique but with maybe different possible paths depending on the initial choice of w and b.
Thank you for clarifying Raymond.