In gradient descent, we know from the contour plot that the value of the cost function J(w,b) is the same for one contour, but as we can see that we can get this value by many different values of w and b, so it seems that we don’t have a unique optimization for the model. am I right?

Also, it seems to be sensitive to the initial choice of the parameters w and b (like initial conditions), in that case we can reach the global minima by just unique path and we can get a unique optimization in that case. right?

It’s not gradient descent that decides whether you have only one optimal w and b, it’s the model and the loss function that decide about it.

For linear regression & squared loss, there is only one global minimum, so there is only one combination of w and b for the optimal solution, and you can get to it from any initial w and b.

However, generally speaking, for a neural network, it can have many local minima, so an optimal solution can be one of those local minima depending on your intiial choice of parameters, and therefore, you can have more than one set of optimized parameters.