as Prof. Andrew told about the local minimum and how it has the slope = 0 which makes it so that we are unable to achieve the global maximum, which in many cases results in not so good value of w and b.

So i am wondering what can we do get the global maximun?

There are two factors:

For most simple systems (like linear and logistic regression), the cost function is convex, so there are no local minima. So there is nothing to worry about.

When the cost function is not convex (such as for a neural network), you can train multiple times using different initial weight values, and use the one with the lowest cost.

Another strategy is to use a validation or test set, and accept any local minimum solution that gives â€śgood enoughâ€ť results.

**One more question :-** in the 2nd picture , how to choose the correct path, because selecting a different point initially (first step) in path resulted in 2 different minimum value .

Hello @SantoshKumarDoodala, in practice, we do not have the information to choose a â€ścorrect pathâ€ť, because we do not know in prior where the minima are, and if we had known them, we did not need to do gradient descent in the first place, because gradient descent is all about finding a minimum.

Although we canâ€™t choose path, we can choose initialization, and we can choose our model architecture. Below are good ways of how we justify our choices.

Raymond