C2W2_LAB3 Univariate Selection select strong correlation feature?

The 20 features selected by univariate selection keeps all the highly correlated features:
‘radius_worst’, ‘perimeter_worst’, ‘area_worst’, ‘perimeter_mean’, is this an efficient(effective) feature selection
algorithm?

Turn out I could further eliminate ‘radius_worst’, ‘perimeter_worst’, ‘area_worst’ and use only 17 features and still get high F1 score:
F-test 0.973684 0.974206 0.953488 0.976190 0.964706
Interestingly I could not eliminate the fourth correlated feature without F1 score decease to 0.95.

Selecting the best predictors with respect to the target variable is a good place to start. Univariate feature selection performs feature selection in isolation i.e. no relationship between input features is accounted for. As you have experimented, it’s good to account for correlated independent variables (i.e. model inputs) prior to building a model.

Do see the next section in the notebook that shows other feature selection methods.