C3w2 lab1 scaling for insight ...?

shahin · July 5, 2021, 7:56pm

I thought the purpose of scaling/normalizing was to assist in the speed and effectiveness of convergence in optimization (e.g. gradient descent) in a learning algorithm’s learning process…

but the following paragraph in the lab (cell#25) makes me wonder if I’ve missed something else more profound about it’s purpose/effect… but I’m really not sure:

Since you are dealing exclusively with geospatial data you will create some transformations that are aware of this geospatial nature. This help the model make a better representation of the problem at hand.
For instance the model cannot magically understand what a coordinate is supposed to represent and since the data is taken from New York only, the latitude and longitude revolve around ( 37 , 45 ) and ( -70 , -78 ) respectively, which is arbitrary for the model. A good first step is to scale these values.

So, is there some type of feature representation learning or some type of model interpretability or something else that feature scaling/normalization is performing… ?

Unconfuse me please.

luigisaetta · July 6, 2021, 6:53am

Hi @shahin

really interesting question.

To “unconfuse” I would say that:

In general scaling the features is to help the convergence of the model. But also to avoid that we have features on a different scale. In some models (thinks linear one) if one feature has a much bigger scale than others it is difficult to see the effect of variation of other features.
Having said that, we could enter in the field of “feature engineering”: if we can transform the feature, or some features, in such a way that their information content is easily shown probably the model will make better use of it.

But, to be honest, in general, scaling is for point n. 1.

shahin · July 6, 2021, 1:38pm

Thanks @luigisaetta ,

I would venture then to say that extracting/deriving more impactful numbers (that are more pertinent to the label), from the raw numbers aka “feature engineering” or more specifically “feature extraction” (i.e. to feed distances in to the learning algorithm rather than individual pairs of coordinates) just gives the learning algorithm a helping hand. If I’m not mistaken, machine learning is limited to learn a function using only multiplication and summation operations, no subtraction (let alone euclidean distance calculations) are included.

Scaling, as you confirmed, is purely an aid to the optimization algorithm. It was discussed in Course 2, but seems to be conflated there and here with “feature engineering”. It’s getting close to a semantic/philosophical point but I think it is potentially confusing. If I may suggest, it might be better to edit that paragraph to end it at “…arbitrary for the model…”, and in a clearly separated section, replace “A good first step is to scale these values” with something like “A powerful and simple preprocessing step that benefits optimization algorithms that are used by certain machine learning algorithms is to scale the numerical features, as described in week 2 of course 2.”

Topic		Replies	Views
C2W2 - Scaling - "the model learns the right weights"? Machine Learning Data Lifecycle in Production	1	568	September 3, 2022
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	531	September 2, 2022
Feature scaling pros and cons Advanced Learning Algorithms week-2	1	393	August 26, 2023
Interpreting the benefits of feature scaling Supervised ML: Regression and Classification week-1	18	618	February 9, 2023
Is my understanding of Feature Scaling correct? Supervised ML: Regression and Classification week-2	3	526	August 12, 2022

C3w2 lab1 scaling for insight ...?

Related topics