CVS error for a model with all the features vs a few features

mehmet_baki_deniz · January 29, 2023, 2:14pm

hi all,

I have trained a logistic regression model after choosing 5 important features out of 12 features.

then I test the model with various polynomial degrees and output both cvs and training accuracy.

I observe that the accuracy for the training set with 12 features combined is bigger than the model with 5 features for polynomial degree 2.

That is normal right ? After all, we select most relevant features to avoid overfitting but a more complex model with all the feautres would always bring more training accuracy at the expense of increased overfitting.

in my model, I found out that 12 features model also doesnt have any over-fitting issue.
So this makes me think that, It is always a good practice to start with all the features at hand and only if there is an issue of overfitting for even degree 1 model, I should consider feature selection. because a complex model will bring higher accuracy provided that there is no overfitting.

I wonder what your reflections are to my discussion.

warmly, mehmet

AbdElRhaman_Fakhry · January 30, 2023, 8:47am

Hi @mehmet_baki_deniz

Using Feature extraction isn’t the good way specially when you have only 12 feature as it’s the big number of features & Also, there can be a relationship between the results and the feature, but it can not be shown in the mathematical equations(Feature extraction or correlations) and choose to drop this feature (personally I saw that the output was the disease and the there are an feature about the pressure of blood and there are an relation between these two features but in the fact the pressure of blood affect on this disease ) :

I thinks using correlation between features and choose the highest correlated columns like for example 2 columns the correlation between them s 0.99 it seems that there are duplicated between them so choose to drop one of them
personally if I would drop many columns I prefer to use PCA to reduce the number of feature and also save an information about these features

Cheers,
Abdelrahman

Topic		Replies	Views
Feature engineering - Week 2: Regression with multiple input variables \| Supervised ML: Regression and Classification week-1	5	524	July 27, 2022
Quick question Advanced Learning Algorithms week-3 , how-to	3	19	October 25, 2024
Suggestions AI Discussions	3	58	October 10, 2023
Polynomial Regression Supervised ML: Regression and Classification week-3	6	598	November 22, 2022
Polynomial Regression - choosing degree of the non linearity Supervised ML: Regression and Classification week-2	4	498	June 29, 2022

CVS error for a model with all the features vs a few features

Related topics