Feature Engineering - Relationship between Parameter Size and Importance

Alexander_Leon · July 2, 2022, 7:40am

For the feature engineering and polynomial regression lab, why do larger parameters signify higher importance? Maybe this only applies once we’ve rescaled all the features?

rmwkwok · July 2, 2022, 8:37am

Hello @Alexander_Leon, I think you are talking about the following:

I think the description is not oriented to discuss feature importance.

The example says the label is squared x (because of this code line: y = x**2), and among the provided features (X = np.c_[x, x**2, x**3] ), we would have only needed the second one which is also squared x and forgotten about the others.

Even all three were provided, the training algorithm was still able to relatively suppress the x term and the x**3 term by making their weights close to zero. However, it is not good enough because we already knew both should just be 0 - and by scaling the features, the algorithm can push them even closer to zeros which is the improvement.

I think we are not discussing feature importance here, but more on how scaling improves gradient descent result.

rmwkwok · July 2, 2022, 9:05am

However, I do have some personal opinions about measuring feature importance by the size of linear regression parameters/weights.

Firstly, yes, I agree that scaling the features is the least thing we should do to make the parameters comparable.

However, the ideal scenario is when the features are uncorrelated with each other.

For example if you have three extremely highly correlated features which are all very good predictors for the label, they are competing with each other for larger weight values. Comparing to the case of only using one of the three in our feature set, using all three can result in 1/3 the weight for each of them and consequently possibly even smaller than another feature that is literally less relevant, which is a wrong measurement.

As a result, the parameter size alone is not just about feature importance, but also coupled with the degree of correlations with other features.

TMosh · July 2, 2022, 7:29pm

The simplest concept is that the magnitude of the weight values tells you how much impact that feature has on the hypothesis.

The larger the magnitude of the weight value (either positive or negative), the more it influences the hypothesis.

A weight value of zero means that feature has no useful impact on the hypothesis, because the weight is multiplied by the feature value.

Topic		Replies	Views
Do we scale the features assuming all the features have the same amount of influence on the result? Supervised ML: Regression and Classification week-module-2	5	443	August 17, 2023
Regularization: Intuition and Conservation of influence Supervised ML: Regression and Classification week-module-3	7	867	July 6, 2022
Effect of feature scaling on a model's parameters Supervised ML: Regression and Classification week-module-2	4	33	December 3, 2024
Feature Scaling Part 1: optimizing number of elements in dw Supervised ML: Regression and Classification week-module-2	1	18	November 24, 2024
How to chose the right value for the regularization parameter? Supervised ML: Regression and Classification week-module-3	9	679	June 22, 2022

Feature Engineering - Relationship between Parameter Size and Importance

Related topics