# Solving linear regression with sklearn gives me wrong result

I’m testing different examples of solving linear regression using sklearn, but it gives me wrong model.

I have input feature x equals `[1 2 3 4 5 6 7]`. The output y is `[14 39 74 119 174 239 314]`. Correct result model should be `f(x) = 5x^2 + 10x - 1`. So, I add the second feature x^2. If I don’t use sklearn and write manually algorithm for linear regression, it results in correct sollution w = [5, 2] and b = -1.

If I use sklearn, it gives me wrong result values of w and b always, which are also different for each run. I don’t understand what’s wrong with my code?

``````import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor

x = np.array([1, 2, 3, 4, 5, 6, 7])
X = np.array([
[1, 1],
[4, 2],
[9, 3],
[16, 4],
[25, 5],
[36, 6],
[49, 7],
])
y = np.array([14,  39,  74, 119, 174, 239, 314])

sgdRegressor = SGDRegressor(max_iter=50000, alpha=0.002)
sgdRegressor.fit(X, y)

print(f'\nNumber of iterations: {sgdRegressor.n_iter_}')

w = sgdRegressor.coef_
b = sgdRegressor.intercept_
y_predict = X.dot(w) + b

print(f'Calculated model parameters w: {w}, b: {b}')
print(f'\nModel\'s predictions for input x: {y_predict}')

plt.scatter(x, y, marker='o', c='r', label='Actual output')
plt.scatter(x, y_predict, marker='x', c='g', label='Predicted output')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
``````

Hi @matrix, it is great you are trying out sklearn!!

``````SGDRegressor(max_iter=500000, alpha=0, learning_rate='constant', eta0 = 0.0001, tol=1e-10, n_iter_no_change=2*len(X))
``````

and I get

``````Calculated model parameters w: [5.00017774 9.99841084], b: [-0.99708895]
``````

My config is not the only one that can give you your desired result, but still, it worths examining it and see if you can understand why my config could make a difference. If you’d like to, we can discuss about your understanding, or if you have question in some of the configs.

Cheers,
Raymond

This can help justify my `eta0`.
`n_iter_no_change` has to do with the fact that you are using SGD, so each iteration considers only one sample of your dataset.