Hey all,
I’ve been coding up the gradient descent stuff in Python, and I was getting annoyed with trying to guess a learning rate by trial-and-error, so I decided to just quickly code up a dynamic learning-rate-adjustment algorithm.
I’m sure I’m not the first one to do this. There’s likely a lot of stuff out on the internet about this, and maybe we will even cover this later in the course??? But I thought I’d share what I did in case it helps others at all.
Basically, in my gradient descent algorithm, I just monitor the cost J to see if it goes up or down on any specific iteration of the loop. Then, if I see the cost J increase, I divide the learning rate in half for the next iteration. Otherwise, if I see the cost J decrease, then I increase the learning rate by 1% for the next iteration. If the previous iteration’s learning rate resulted in a decrease in cost, but the current iteration’s learning rate resulted in an increase, then it’s likely that the previous iteration’s learning rate is the largest possible learning rate that results in a decreasing cost, so I just fall back to that learning rate.
I also do an automated test for convergence rather than manually guessing how many iterations I need. The way that I test for convergence is to basically just see if the cost went down by less than 1% during the most recent iteration of the loop.
All of these parameters are adjustable, of course, but I felt that a less-than-1%-change is probably good enough for most purposes.
Here is the section of my loop that deals with all of this logic:
#Determine whether or not to change the learning rate
if (J > J_history[-1]):
#Check to see if the previous learning rate worked...
if (did_previous_learning_rate_work):
#If so, set the learning rate to be the previous learning rate
learning_rate = previous_learning_rate
optimal_learning_rate_achieved = True
else:
#Decrease the learning rate by half...
previous_learning_rate = learning_rate
did_previous_learning_rate_work = False
optimal_learning_rate_achieved = False
learning_rate /= 2
else:
#Determine whether we have finished
miniscule_change = J * 0.01
if ((abs(J_history[-1] - J) <= miniscule_change) or (J == 0)):
done = True
#Determine whether to change the learning rate
if (not optimal_learning_rate_achieved):
previous_learning_rate = learning_rate
learning_rate = learning_rate + (learning_rate * 0.01)
did_previous_learning_rate_work = True
When the “done” variable is set to True, then my loop knows to break.
To illustrate the results, I used the extremely small/simple dataset from the course:
X_train = numpy.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = numpy.array([460, 232, 178])
After normalizing the data with the z-normalization method, I passed the data into the gradient descent algorithm.
I then ran the algorithm 3 different ways:
- The normal gradient descent algorithm with no dynamic adjustment of the learning rate. I set the default learning rate to 0.1. This algorithm finished/converged after 329 iterations. The final cost J = 2.8x10^-26.
- Gradient descent with dynamic learning rate adjustment. I set the initial learning rate to 0.1. This algorithm finished/converged after 143 iterations. The final cost J = 1.6x10^-27.
- Gradient descent with dynamic learning rate adjustment. I set the initial learning rate to 2 (purposefully very large). The algorithm finished/converged after (surprisingly) 32 iterations, with a final cost J = 1.7x10^-3.
Finally, I tried predicting the price of a house with each of these models. The data I used to predict was from the course:
array([1200, 3, 1, 40])
- Model 1 (no dynamic adjustment, a = 0.1): predicted house price = $281,683
- Model 2 (dynamic adjustment, initial a = 0.1): predicted house price = $281,683
- Model 3 (dynamic adjustment, inital a = 2.0): predicted house price = $281,696
All seem pretty comparable.
Anyway, I was curious - maybe from those who have done a bit more machine learning - is this a fairly reasonable way to do dynamic learning rate adjustment? What other methods have been tried?