Solving linear regression with sklearn gives me wrong result

matrix · July 16, 2022, 9:44am

I’m testing different examples of solving linear regression using sklearn, but it gives me wrong model.

I have input feature x equals [1 2 3 4 5 6 7]. The output y is [14 39 74 119 174 239 314]. Correct result model should be f(x) = 5x^2 + 10x - 1. So, I add the second feature x^2. If I don’t use sklearn and write manually algorithm for linear regression, it results in correct sollution w = [5, 2] and b = -1.

If I use sklearn, it gives me wrong result values of w and b always, which are also different for each run. I don’t understand what’s wrong with my code?

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor

x = np.array([1, 2, 3, 4, 5, 6, 7])
X = np.array([
    [1, 1],
    [4, 2],
    [9, 3],
    [16, 4],
    [25, 5],
    [36, 6],
    [49, 7],
])
y = np.array([14,  39,  74, 119, 174, 239, 314])

sgdRegressor = SGDRegressor(max_iter=50000, alpha=0.002)
sgdRegressor.fit(X, y)

print(f'\nNumber of iterations: {sgdRegressor.n_iter_}')

w = sgdRegressor.coef_
b = sgdRegressor.intercept_
y_predict = X.dot(w) + b

print(f'Calculated model parameters w: {w}, b: {b}')
print(f'\nModel\'s predictions for input x: {y_predict}')

plt.scatter(x, y, marker='o', c='r', label='Actual output')
plt.scatter(x, y_predict, marker='x', c='g', label='Predicted output')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

rmwkwok · July 16, 2022, 10:09am

Hi @matrix, it is great you are trying out sklearn!!

And please also try this:

SGDRegressor(max_iter=500000, alpha=0, learning_rate='constant', eta0 = 0.0001, tol=1e-10, n_iter_no_change=2*len(X))

and I get

Calculated model parameters w: [5.00017774 9.99841084], b: [-0.99708895]

My config is not the only one that can give you your desired result, but still, it worths examining it and see if you can understand why my config could make a difference. If you’d like to, we can discuss about your understanding, or if you have question in some of the configs.

Cheers,
Raymond

matrix · July 18, 2022, 6:55pm

Thanks a lot, it works. I guess I need to read more information about SGDRegressor parameters and criteria for choosing appropriate ones. Thanks for help, I’ll read more about it.

rmwkwok · July 19, 2022, 1:05am

You are welcome @matrix. I have two pieces of hints.

This can help justify my eta0.

n_iter_no_change has to do with the fact that you are using SGD, so each iteration considers only one sample of your dataset.

Cheers,
Raymond

Topic		Replies	Views
How to use SGDRegressor to get prediction for a specific input? Supervised ML: Regression and Classification week-2	3	497	January 5, 2023
Doubt in a python code I’ve written Supervised ML: Regression and Classification week-3	6	522	November 30, 2022
C1_W2 scilearn does not work with feature normalization on X_test Supervised ML: Regression and Classification week-2	15	540	August 5, 2022
C1_W2_Linear_Regression - getting Assertion errors despite correct values Supervised ML: Regression and Classification week-2	2	565	July 13, 2022
Scikit Regression comparison of train vs normalized X Supervised ML: Regression and Classification week-2	2	399	July 5, 2023

Solving linear regression with sklearn gives me wrong result

Related topics