About training and testing dataset of Logistic Regression using Scikit-Learn

It seems the training and testing datasets are totally the same…
From practice, there are four steps, including data set preparation, logistic function model call, the model training or fitting, then testing or predicting the result. The prediction is 100 percent which seems quite good.
However, the training and the testing data are totally the same, which should be split into different parts.
Hope what I’ve said is clear, thanks for the feedback.

Hi @jing225,

You said the two datasets are totally the same, did you mean that they have the same set of samples? If you meant that, it’s our work to make sure they don’t by splitting the whole dataset into a training set and a testing set so that no sample exists in both sets.

In case you are referring to something else, please let us know what you mean by “the training and testing datasets are totally the same”, for example, what kind of quaility is the same among them?

Hi @rmwkwok,

Thanks for your feedback!

Yes, exactly, as normal split data is necessary, otherwise, the prediction seems meaningless.
You can refer to the code of “Optional lab: Logistic regression with scikit-learn”, I’ve just double-checked the code as following:

y_pred = lr_model.predict(X)

print(“Prediction on training set:”, y_pred)

It said “prediction on training set”, since it’s sample code, maybe it would be better to split the data and add another cell to work on “prediction on testing dataset” in the sample code.

Just my suggestion, thanks!

Thanks @jing225. I just read the notebook and I guess it wasn’t meant to be covering too many topics but just to demo the use of scikit-learn, but thank you for your suggestion.

This is covered later in the course. For this stage of an introduction course, just learning to make predictions is the goal.

OK,it’s clear, thanks! @TMosh

Thanks @rmwkwok , will keep on learning with fun.