I’m testing different examples of solving linear regression using sklearn, but it gives me wrong model.
I have input feature x equals [1 2 3 4 5 6 7]
. The output y is [14 39 74 119 174 239 314]
. Correct result model should be f(x) = 5x^2 + 10x - 1
. So, I add the second feature x^2. If I don’t use sklearn and write manually algorithm for linear regression, it results in correct sollution w = [5, 2] and b = -1.
If I use sklearn, it gives me wrong result values of w and b always, which are also different for each run. I don’t understand what’s wrong with my code?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
x = np.array([1, 2, 3, 4, 5, 6, 7])
X = np.array([
[1, 1],
[4, 2],
[9, 3],
[16, 4],
[25, 5],
[36, 6],
[49, 7],
])
y = np.array([14, 39, 74, 119, 174, 239, 314])
sgdRegressor = SGDRegressor(max_iter=50000, alpha=0.002)
sgdRegressor.fit(X, y)
print(f'\nNumber of iterations: {sgdRegressor.n_iter_}')
w = sgdRegressor.coef_
b = sgdRegressor.intercept_
y_predict = X.dot(w) + b
print(f'Calculated model parameters w: {w}, b: {b}')
print(f'\nModel\'s predictions for input x: {y_predict}')
plt.scatter(x, y, marker='o', c='r', label='Actual output')
plt.scatter(x, y_predict, marker='x', c='g', label='Predicted output')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()