# Optional Lab: Linear Regression using Scikit-Learn

mehmet_baki_deniz · December 7, 2022, 2:04pm

hi all,

here is a code segment from the lab that I have problems understanding.
how do skilearn understand which colon to assign to x_train and y_train?
why is X_features defined in the code? I do not see any place where it is later used.
I also run the code without the X_features variable and it worked and output the same results.

X_train, y_train = load_house_data()

X_features = [‘size(sqft)’,‘bedrooms’,‘floors’,‘age’]

scaler = StandardScaler()

X_norm = scaler.fit_transform(X_train)

sgdr = SGDRegressor(max_iter=1000)

sgdr.fit(X_norm, y_train)

b_norm = sgdr.intercept_

w_norm = sgdr.coef_

y_pred_sgd = sgdr.predict(X_norm)

make a prediction using w,b.

y_pred = np.dot(X_norm, w_norm) + b_norm

Kic · December 7, 2022, 3:22pm

Hi @mehmet_baki_deniz

The utility function load_house_data() returns two outputs, and the first is X_train, and the second is y_train. You can take a look at the source code of load_house_data() by clicking the File ->open->lab_utils_multi.py. The function is located near the bottom of the file.
You can see the file lab_utils_multi.py is linked to this notebook at the top of the import statement.
X_features is a variable used for the visual display at the Plot results section, it is not for training the model. You can find how it is being used at the end where the predictions and targets are plotted against the original features.

mehmet_baki_deniz · December 7, 2022, 4:19pm

thank you very much for the response

juansoliscas · December 8, 2022, 4:15pm

In the code segment you provided, load_house_data() is a function that loads the training data for the house price prediction problem. The function returns a tuple containing two arrays, X_train and y_train, where X_train is a 2D array of features and y_train is a 1D array of labels.

The X_features variable is defined as a list of strings, but it is not used in the code. It is likely that this variable was intended to be used as a list of feature names, but it is not necessary for the code to run. You can remove the X_features variable and the code should still work and produce the same results.

The StandardScaler class from scikit-learn is used to normalize the training data by subtracting the mean and dividing by the standard deviation of each feature. This is done to make the data distribution more symmetrical and improve the performance of the regression model. The fit_transform() method is used to fit the scaler to the training data and transform the data, so that each feature has a mean of 0 and a standard deviation of 1. The transformed data is stored in the X_norm variable.

The SGDRegressor class from scikit-learn is used to train a stochastic gradient descent (SGD) regression model on the normalized training data. The max_iter parameter specifies the maximum number of iterations to run the SGD algorithm. The fit() method is used to train the model on the training data, and the intercept_ and coef_ attributes are used to retrieve the model’s intercept and coefficients, respectively. The predict() method is used to make predictions on the training data, and the predictions are stored in the y_pred_sgd variable.

Finally, the dot product of the normalized training data and the model coefficients is computed, and the intercept is added to the result to make a prediction using the model’s parameters. This prediction is stored in the y_pred variable.

mehmet_baki_deniz · December 15, 2022, 1:59pm

thank you very much for your detailed response

Topic		Replies	Views
Error Code/Model Representation/ Supervised Machine Learning:Regression & Classification Supervised ML: Regression and Classification week-module-1	4	54	August 8, 2024
On The Verge Of Doing The Practice Lab Week 2 Supervised ML: Regression and Classification week-module-2	11	879	January 21, 2023
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln Supervised ML: Regression and Classification week-module-2	7	654	June 16, 2024
C1_W2_Linear_Regression - x_train and y_train Supervised ML: Regression and Classification week-module-2	3	610	May 29, 2023
#C1-w2: Practice Lab: Linear Regression: Programming Assignment: Name Error Supervised ML: Regression and Classification week-module-2	9	111	September 24, 2024

# Optional Lab: Linear Regression using Scikit-Learn

make a prediction using w,b.

Related topics