Hi @Tannaz_Monajemi ,
Welcome to the community! This is your first post
Regarding your question, I’d like to share a couple of links that discuss this very same topic:
Hi @KiraDiShira ,
Let me attempt to answer this quoted question, from the more general concept of Gradient Descent:
The goal of the models is to optimize it to predict very close to ground truth. For this optimization we use gradient descent, which assumes that the model can converge.
In simple systems where you have very few dimensions, you would use full gradient descent, a simpler formula to reach optimization. In these simple models with 1-2 dimensions, you can get to local minima that a…
and
It is an important question, but the answer has lots of layers to it.
For the simple case of Logistic Regression, the cost function is actually convex, so it has a single global minimum and no local minima. Once we graduate to real Neural Networks, though, that is no longer true. The cost surfaces are not convex and there can be lots of local optima.
One high level point to make is that convergence (even to a local minimum) is never guaranteed: if you pick a learning rate that is too high, you…
I encourage you to read carefully these links as they will develop this topic clearly. In particular, the 2nd link will also address a paper by an important group of researchers.
Please share your thoughts.
Juan