About feature scaling

Marcos_Quintas_Perez · March 10, 2023, 12:44am

A doubt about feature scaling has just come to my mind. Is it necessary make feature scaling in binary (or multi-class) classification problems in order to run gradient descent faster? And should it been made also regularization in order to reduce overfitting in that kinds of problems?

Raja_Sekhar · March 10, 2023, 1:29am

What we are trying to do is to predict a y basing on given X features , if values of different features are in different range as in feature x1 is multiples of 1000’s and feature x2 is multiples of 0.001’s , running gradient decent , reaching optimum w and b values which satisfies both x1 and x2 will take time. So scaling them similar range of values by using either mean normalization or z score normalization will help in compute time for prediction, cost and gradient decent.
Regularization add more weight to the cost in a way that the w won’t go very close to the current dataset for given iterations and the model is generalized to predict future inputs. different Regularization apply penalty as L1, L2 , which you decide basing on your priority of the features.
A call on when to do scaling and Regularization is purely driven by the input data and features selected. With my knowledge i can’t say how far these steps are automated, may be more experienced people could comment on it.
All the best, Have a great learning.

rmwkwok · March 10, 2023, 1:55am

Thanks @Raja_Sekhar! Apart from @Raja_Sekhar’s answer, I personally think it is necessary to do feature scaling no matter it is a binary or multi-class classification, or a regression problem. Sometimes the features are provided in a manner that they are already quite scaled, but there is no harm to just do our own feature scaling.

Feature scaling and regularizations are two independent topics. As you said, feature scaling helps gradient descent to converge better; whereas regularization helps solve overfitting. We always need feature scaling, and we may need regularization when it is overfitting.

Cheers,
Raymond

Marcos_Quintas_Perez · March 10, 2023, 6:12pm

Thanks! And where you are building a tensor network using tensorflow, do you have to indicate in any part of the code that you want to make regularization? or is it automatically implemented?

rmwkwok · March 11, 2023, 1:07am

Yes, @Marcos_Quintas_Perez. By default there is no regularization enabled if we do not add them ourselves. In tensorflow, we need to add it for each layer.

tf.keras.layers.Dense is the name of the layer that we have been learning about in the MLS. We specify that we want to add a L2 regularization to a Dense layer by replacing kernel_regularizer=None with kernel_regularizer='L2' when we create it. For example, the line below will create a Dense layer with 10 nodes (also called units) and L2 regularization enabled.

layer = tf.keras.layers.Dense(10, kernel_regularizer='L2')

If our neural network has 3 layers deep, then we need to add 'L2' for each of the three layers to fully enable L2 regularization in the whole network.

Cheers,
Raymond

Topic		Replies	Views
Feature scaling vs regularization Supervised ML: Regression and Classification week-3	2	512	March 6, 2023
Feature scaling versus regularization Supervised ML: Regression and Classification week-3	1	331	February 14, 2024
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	532	September 2, 2022
Feature scaling - only in x training? Advanced Learning Algorithms	5	101	June 15, 2024
About gradient descent and Features scaling Supervised ML: Regression and Classification week-2	6	554	August 19, 2022

About feature scaling

Related topics