Could you explain how the slope is positive/negative in the gradient descent
Hi @naina_dwivedi, here’s a thread which you can look upon. Please let me know if you get your answer through this link, otherwise, we can always have a more deeper look to what you have asked for. Thanks!
Thanks for the explanation. However,I want to know how the slope is decided to be positive or negative. for e.g in the gradient descent lecture,the graph shows when the random value of w is small,the derivative is negative and hence the updated w value is the increased one,and in turn the descent is in positive direction. Could you explain how the slop value is postive.
Hi @naina_dwivedi, okay, let me explain it for you. As Prof. Andrew mentions in week 2 of Course 1, gradient descent (GD) is a series of functions which identifies the slope in all directions at any given point and adjust the parameters of equation to move in a direction of the negative slope (the minimum point).
Logistic Regression seems to be a very unique case, where cost function is actually convex. But, it’s not guaranteed always. If we choose a higher learning rate, we can always get a divergence/overshoot rather than convergence.Thus, the change in the objective function is positive at each step rather than negative. To make the gradient descent work, the learning rate should be small which could give an approximate output. Hope, this explanation will work for you.
Thank you for your explanation
The gradients are the partial derivatives of the cost function with respect to the various w and b parameters. The derivative of a function can be either positive or negative at a given point, right? If it’s positive, it says that increasing the w_i value will increase the cost. If it’s negative, it says that increasing the w_i value will decrease the cost.
Of course what Gradient Descent does is to move in the opposite direction of the maximum gradient. That’s why the gradients are multiplied by -1. The maximum gradient points in the direction of the fastest increase of the cost and what we want is the fastest decrease of the cost, right?
I think Prof Ng uses the “ski slope” metaphor in the lectures to explain how all this works. Imagine you are a skier and you’re standing at a particular point on the ski slope. The maximum gradient at that point will be a vector that points in the direction of most steepness going up the hill from that point. If you take the exact opposite direction by multipying that vector by -1, then it points in the direction of the steepest descent from that point. If you were carrying a soccer ball and you gently placed it at your feet, the direction of steepest descent is the direction the ball will roll when you release it. That is what skiers call “the fall line”. That is mathematically how Gradient Descent works: it starts a given point and takes a small step in the direction of the steepest decrease in cost. How big the step is will be determined by the “learning rate”. Then it recomputes at the new point which direction is steepest downward and takes another small step in that direction. Rinse and repeat.
If you are lucky and choose your learning rate well, you will eventually converge to a good minimum value of the cost, which represents the best you can do on a given Logistic Regression problem. Note that there is never any guarantee that the final cost will be zero. Logistic Regression can only do linear separation between the “yes” and “no” answers and your data may not have a clean linear separation.