Query on Feature Selection

Just curious about why the feature selection part is missing in the programming assignment? I only saw it cover feature engineering/transformation part in the assignment.

Also a little confuse about the concept filter, wrapper and embedded methods mentioned in the course. It seems some of the algorithm mentioned in method can also be applied in another. Like I have no idea the concretely separate wrapper and embedded methods introduced in the course and the ungraded lab. What’s the difference between ref and SelectFromModel when it applied to random forest model? I only see there is a difference that in rfe use top 20 features and SelectFromModel use feature importance as threshold but cannot specify the essential difference.

Please be specific about the notebook you’re referring to.

Hey @balaji.ambresh here is only one graded lab in week 2:

And the feature selection lab I mentioned here:

The graded lab focusses on feature engieering using tfx. Given that you know about feature selection via the ungraded lab, you can apply the same principles of feature selection along with feature engineering for your tasks.

As you’ve mentioned, SelectFromModel uses a threshold for selecting important features. This approach is applicable only when the underlying estimator has attributes like coef_ or feature_importances_. The estimator is fit on data if required and then the filtering is done. Use this method when you have control over the strength of features that need to be pruned but don’t have restrictions over the number of selected features.

In case of rfe, the user knows the number of features to keep i.e. n_features_to_select. The model is cloned (internally) and fit on subset of features to recursively eliminate unwanted features and return the relevant features. This method is a good starting point if you are willing to spend considerable time on feature selection (this is a greedy approach) and when there are few features.

Thanks for the reply. Can I understand it like for embedded method, it only take advantage of the model itself and read the model property like coef_ or feature_importance_ and per user’s request to draw a hard line with a threshold, for all the features if the property pass the threshold then applied or removed.

And for the wrapper method, it becomes more complicated. It rather than read the property directly, but just as a follow up reference. Since it’s a recursive elimination process, can I say it’s actually a search problem, like dfs or bfs and for each step it try to eliminate the feature with low property first but the final decision always decided by the metric (e.g. accuracy ,recall or precision…), thus it’s possible that the feature in the original model with lower property (e.g. coef_) kept but higher one removed since this could result in better evaluation performance?

Your understanding of SelectFromModel is correct.

As far as wrapper method is concerned, the approach is greedy. Here’s the explanation of RFE

@balaji.ambresh That’s the document which I read and raised this confusion. Perhaps misunderstand this

“Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.”

If the pruning procedure is to eliminate the less important feature, won’t it always generate the same set of features once the request number of feature been decided. And what’s the point to do the recursive repeated on the pruned set until desired number of features to select eventually reached? Is that mean for each recursive function call it remove the least important feature from current feature set (set after previous pruning performed) and train a model? But if the ultimate goal have not reach yet, what’s this model used for?

Depending on the subset of input features, their predictive power to the target variable can vary. Let’s consider coefficients to represent feature importances.

Here’s an example where 2 models are used to predict price of a house:

  1. Model1(inputs=(square_feet, num_bedrooms), output=price)
  2. Model2(inputs=(square_feet), output=price)

As you’d expect, unless num_bedrooms has very low predictive power, square_feet is likely to have diffferent values for coefficients across both models.

This elimination process starts with all features and ends when n_features_to_select is reached. We remove 1 feature at a time after fitting the model with the remaining subset of features.

Oh I see after the less important feature removed and retrain the model the feature_importance_ re-evaluated and the elimination order impacted, great explanation! Thanks.