How to plot Cost vs. Iteration when using SGDRegressor?

kagudimov · October 11, 2024, 3:37pm

Hello,

In the lab “C1_W2_Lab05_Sklearn_GD_Soln” we learn to use Linear Regression using Scikit-Learn, specifically using SGDRegressor.

After fitting a model with SGDRegressor, how can we check the model’s learning has actually converged?

Additionally, how can we plot cost vs. iterations to visually check the learning progress? It was mentioned in lectures, that it is good practice.

rmwkwok · October 12, 2024, 12:55am

Hello, @kagudimov!

That’s a good question!

Plotting the progress is a good practice and how we can check the convergence, but since SGDRegressor only prints training losses (given its verbose parameter set to > 0), we might redirect stdout to a text file from which then we can extract the epoch numbers and the training losses for the plot. This can give us the training loss curve.

However, unless we modify SGDRegressor to, for example, share the weights at the end of each epoch, we do not have the validation loss curve.

One more point on convergence: SGDRegressor takes it as converged if either (not both) the training loss or validation loss stops improving (under a configurable parameter) over a configurable number of iterations. Once reached, it stops the fitting and returns the actual number of iterations, so if that number is smaller than the configured number of iterations, it is an evidence that such convergence has been reached. Alternatively, with verbose > 0, it will also print a message for convergence. For more, see here and check out the parameters list.

Cheers,
Raymond

rmwkwok · October 12, 2024, 1:07am

We can use this (with Python >= 3.4) to redirect stdout to a text file. See this for an example.

kagudimov · October 23, 2024, 1:19pm

Thank you for your reply! I read the documentation and conducted experiments. Here’s what I found while working with SGDRegressor:

In addition to using stdout, there is one more way to draw the training loss curve. We can use partial_fit, which performs one epoch at a time. This allows us to observe cost and coefficients between epochs. I decided to try it, and here’s what I got:

This example function runs SGDRegressor and returns a history of the cost and coefficients. I plotted it as we did in the labs.

def run_sgd(X, y, iterations, alpha):
    reg = SGDRegressor(penalty=None, learning_rate='constant', eta0=alpha, verbose=0)
    hist = {"iter": [], "cost": [], "coef": [], "intercept": []}
    for i in range(iterations):
        reg.partial_fit(X, y)
        hist["iter"].append(i)
        hist["cost"].append(mean_squared_error(y_train, reg.predict(X))/2)
        hist["coef"].append(reg.coef_.copy())
        hist["intercept"].append(reg.intercept_.item())
    return hist

Example of fitting with a “good” \alpha:

Example of fitting with a “bad” \alpha:

SGDRegressor adjusts coefficients after each sample, so we can split the initial dataset into smaller portions and iterate over these portions on each epoch using partial_fit. This way, we can get even more intermediate values of the coefficients.
There’s one more case where SGDRegressor stops early: when it diverges instead of converging.
SGDRegressor does not indicate divergence directly, so I found that checking .score(X_test, y_test) (or at least .score(X_train, y_train)) is very useful after fitting the model. If the score is negative, then the model has definitely not fitted well. If it is zero, then the predictions are no better than the constant mean value of y.

rmwkwok · October 26, 2024, 1:27am

I was not aware of partial_fit! In fact, I did not even think of setting max_iter=1 to mimic partial_fit with fit . Now I prefer your approach more than what I suggested .

The plots look good, too. Your good alpha can converge within 20 iterations, so it is good! One thing, if I may suggest, is to add the test curve to the plots. We can discuss the reason but it is also explained in MLS’s Course 2 Week 2.

It is good to see your findings. Thanks for sharing them, @kagudimov.

Cheers,
Raymond

kagudimov · October 28, 2024, 4:34pm

I am glad to hear that I found something useful

Thank you for suggesting to add the test curve. With minimal changes to the code with train_test_split I could plot it. Looking forward to the next course.

Topic		Replies	Views
Loss vs iteration function for sklearns LogisticRegression() function Supervised ML: Regression and Classification week-3	3	471	March 9, 2023
Parameters & Cost log per iteration Neural Networks and Deep Learning	2	497	July 25, 2022
Scikit Regression comparison of train vs normalized X Supervised ML: Regression and Classification week-2	2	399	July 5, 2023
How to generate Iteration vs. Cost Graph for Test/CV Data Improving Deep Neural Networks: Hyperparameter tun	5	508	March 10, 2023
C1 w2 ML specialization - Option Lab - Multi Variable Linear Regression Supervised ML: Regression and Classification week-2	9	446	July 3, 2023

How to plot Cost vs. Iteration when using SGDRegressor?

Related topics