During the course we are presented with a way of choosing the appropriate learning rate alfa for gradient descent. It’s how I have been doing it in my personal exercises basically setting a higher reference value say 0.1 than running for N number of iterations after all iterations are done I visualize the cost/iteration graph and decide if I should increase or decrease the value.
However, this can be time-consuming before you get to a sensible value. For instance, I always compare my values for slope and intercept determined by N attempts of gradient descent to that of the SKLEARN Linear/LogisticRegression model as well as prediction using my own method and SKLEARN.predict method.
Somehow SKLEARN always gets better (not by a large margin) values for slopes and intercept. Looking at the code it uses a separate optimization library.
Looking at StatQuest we can find values for slope and intercept using the following formula:
x.mean * y.mean - (x * y).mean
---------------------------------------------- = slope
(x.mean)^2 - (x^2).mean
and
y.mean - slope * x.mean = y_intercept
This gives nearly identical values to what SKLEARN returns and gives predictions nearly exactly the same.
What I wanted to know is how well does this method apply in different scenarios compared to running gradient descent. Or is there a better way to determine slope & intercept values other than manual re-run of the gradient descent?
One thing I have been doing so that I don’t have to sit behind the PC is basically run gradient descent and calculate R^2 using it. Using R^2 I do the next runs increasing/decreasing the learning rate and the number of iterations. I do this N number of times then come back and see where I got and its usually somewhat close to SKLEARN.