Doubt regarding sklearn C1_W2

C1_W2_Lab05_Sklearn_GD_Soln

i jus have a doubt why we are using the same normalized value that has been used for training and predicting , wont that be same plotting?

sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y_train)

y_pred_sgd = sgdr.predict(X_norm)

hi @Sky4me

Please note the difference

  1. sgd.fit(X_norm, y_train) (Training/Learning):
  • Purpose: The model looks at the input features (X_norm) and the true target values (y_train) to learn the best slope (coefficients) and intercept.
  • Action: It updates the model parameters to minimize the error (loss function).
  • Output: Returns the fitted model object itself, updating internal weights.

where as

  1. y_predict = sgd.predict(X_norm) (Inference/Prediction):
  • Purpose: The model uses the internal weights it learned during the fit() step to generate predictions for a given set of input features.

  • Action: It calculates a new value for each row in X_norm using
    Y_pred = X_norm * coef + intercept.

  • Output: Returns an array of predicted values (y_predict)

The reason we use same X_norm for both is to predict values only to evaluate how well the model learned the training set. Sometimes we called it as “training error” or “training accuracy”.

Helps us to learn if model has enough capacity to learn the data, or if it is underfit.

It is not mandatory to get the same plot as
y_train are the actual, true target values (the ground truth).
y_predict are the predicted values generated by the model’s best guess.

If your model were perfect, y_predict would equal y_train which highly unlikely in real world data.

Plotting y_train vs X_norm shows the true data points.
Plotting y_predict vs X_norm shows the linear line (or hyperplane) the model created to fit those points.

fit() is for learning; predict() is for applying what was learned. Using the same data to predict just shows how well the model learned that data, but the predicted output (y_predict ) is almost always different from the true output (y_train) unless one is able to create the most perfect model, that’s is highly unlikely as we say the best model also has 99.999% accuracy but in the real world unseen data, it can cause perfect model to not perform as perfectly as it did during training.

Regards
DP

thank you so much , i understand it now.

1 Like