Plotting the progress is a good practice and how we can check the convergence, but since SGDRegressor only prints training losses (given its verbose parameter set to > 0), we might redirect stdout to a text file from which then we can extract the epoch numbers and the training losses for the plot. This can give us the training loss curve.
However, unless we modify SGDRegressor to, for example, share the weights at the end of each epoch, we do not have the validation loss curve.
One more point on convergence: SGDRegressor takes it as converged if either (not both) the training loss or validation loss stops improving (under a configurable parameter) over a configurable number of iterations. Once reached, it stops the fitting and returns the actual number of iterations, so if that number is smaller than the configured number of iterations, it is an evidence that such convergence has been reached. Alternatively, with verbose > 0, it will also print a message for convergence. For more, see here and check out the parameters list.
Thank you for your reply! I read the documentation and conducted experiments. Here’s what I found while working with SGDRegressor:
In addition to using stdout, there is one more way to draw the training loss curve. We can use partial_fit, which performs one epoch at a time. This allows us to observe cost and coefficients between epochs. I decided to try it, and here’s what I got:
This example function runs SGDRegressor and returns a history of the cost and coefficients. I plotted it as we did in the labs.
def run_sgd(X, y, iterations, alpha):
reg = SGDRegressor(penalty=None, learning_rate='constant', eta0=alpha, verbose=0)
hist = {"iter": [], "cost": [], "coef": [], "intercept": []}
for i in range(iterations):
reg.partial_fit(X, y)
hist["iter"].append(i)
hist["cost"].append(mean_squared_error(y_train, reg.predict(X))/2)
hist["coef"].append(reg.coef_.copy())
hist["intercept"].append(reg.intercept_.item())
return hist
SGDRegressor adjusts coefficients after each sample, so we can split the initial dataset into smaller portions and iterate over these portions on each epoch using partial_fit. This way, we can get even more intermediate values of the coefficients.
There’s one more case where SGDRegressor stops early: when it diverges instead of converging.
SGDRegressor does not indicate divergence directly, so I found that checking .score(X_test, y_test) (or at least .score(X_train, y_train)) is very useful after fitting the model. If the score is negative, then the model has definitely not fitted well. If it is zero, then the predictions are no better than the constant mean value of y.
I was not aware of partial_fit! In fact, I did not even think of setting max_iter=1 to mimic partial_fit with fit . Now I prefer your approach more than what I suggested .
The plots look good, too. Your good alpha can converge within 20 iterations, so it is good! One thing, if I may suggest, is to add the test curve to the plots. We can discuss the reason but it is also explained in MLS’s Course 2 Week 2.
It is good to see your findings. Thanks for sharing them, @kagudimov.
Thank you for suggesting to add the test curve. With minimal changes to the code with train_test_split I could plot it. Looking forward to the next course.