So gradient descent is a function of derivative of cost function with constant w and alpha. So if we take a point at local maxima of the graph shown in figure then slope would be zero so value of W would not be changed but we are at the local maxima of the cost function so in this case the algorithm would not be able guide us to the local minima.
Can anyone please explain me this.?
I will try to explain one thing at a time:
- Gradient descent is the algorithm that tries to optimize the weights of the network, so this weights are not constant: they are updated in each step. Even the alpha value can change during the training. Check schedules from the TF documentation.
- In that particular case you proposed: yes, you are right. Gradient descent would not change the weights because the derivative would be zero. Nonetheless, it is extremely unlikely that this could happen. It can return a very low change value, but usually not zero.
I got the point.
So what you want to say is that we can consider it as a limitation of the Gradient Descent algorithm.
Kind of. But in reality is practically negligible.
Ok got it Alvaro.
Thanks for replying.
my understanding is the cost function J(w) is designed to be convex with only one minimal value and no local maximal, so this scenario is not the case
That is not really the case, @sangdinh.
In fact, the cost function is designed to be differentiable, but no one can assure you that it won’t have local minima / maxima
ah, my bad, correction, it’s convex with logistic regression (NN with one node) with error log function L = -𝑦𝑙𝑜𝑔(a) - (1−𝑦)𝑙𝑜𝑔(1−a)
In general, neural network (with hidden layers(s)) isn’t convex.