If a cost function fw,b(x) has multiple local minima, by choosing different learning rate, we could reach different one right?
But how we could choose the learning rate that could reach the minimum value of the cost function?
No, the learning rate only controls the magnitude of the updates to the weights - it does not alter their direction.
One remedy for local minima is to start gradient descent from different initial weight values. If you do this many times, you can pick the solution that gives the lowest final cost.
However, in linear regression and logistic regression, this is not a concern. Both of those cost functions are convex, so there are no local minima.
But like if there are multiple local minima, you have to choose different learning rate to go through all of them right?
No, the learning rate has nothing to do with local minima.
Oh, so it means that you just have to adjust the weight to get different local minima right?
Adjust the starting weights and then let the gradient descent do its work. As @TMosh mentioned you can try this with different initial values for the weight and pick the one that delivers the lowest final cost.
In the case of linear regression and logistic regression there is no risk of getting stuck at a local minima. The learning rate is more of the deciding factor - a lower learning rate means it will take longer to converge. A higher learning rate will get you to comvergence faster (provided the learning rate is not so high that it diverges)
How do we for force different initial weights? Isn’t it randomly assigned everytime we train?
My suggestion would be to run this before you setting up your neural network. This will set the seed for the random generator that is going to be used to actually initialize the weights. Setting it to the same seed value makes sure the weights are initialized to the same set of values. Setting it not to the same seed, then the weights will not be the same set.