A doubt about feature scaling has just come to my mind. Is it necessary make feature scaling in binary (or multi-class) classification problems in order to run gradient descent faster? And should it been made also regularization in order to reduce overfitting in that kinds of problems?
What we are trying to do is to predict a y basing on given X features , if values of different features are in different range as in feature x1 is multiples of 1000’s and feature x2 is multiples of 0.001’s , running gradient decent , reaching optimum w and b values which satisfies both x1 and x2 will take time. So scaling them similar range of values by using either mean normalization or z score normalization will help in compute time for prediction, cost and gradient decent.
Regularization add more weight to the cost in a way that the w won’t go very close to the current dataset for given iterations and the model is generalized to predict future inputs. different Regularization apply penalty as L1, L2 , which you decide basing on your priority of the features.
A call on when to do scaling and Regularization is purely driven by the input data and features selected. With my knowledge i can’t say how far these steps are automated, may be more experienced people could comment on it.
All the best, Have a great learning.
Thanks @Raja_Sekhar! Apart from @Raja_Sekhar’s answer, I personally think it is necessary to do feature scaling no matter it is a binary or multi-class classification, or a regression problem. Sometimes the features are provided in a manner that they are already quite scaled, but there is no harm to just do our own feature scaling.
Feature scaling and regularizations are two independent topics. As you said, feature scaling helps gradient descent to converge better; whereas regularization helps solve overfitting. We always need feature scaling, and we may need regularization when it is overfitting.
Cheers,
Raymond
Thanks! And where you are building a tensor network using tensorflow, do you have to indicate in any part of the code that you want to make regularization? or is it automatically implemented?
Yes, @Marcos_Quintas_Perez. By default there is no regularization enabled if we do not add them ourselves. In tensorflow, we need to add it for each layer.
tf.keras.layers.Dense is the name of the layer that we have been learning about in the MLS. We specify that we want to add a L2 regularization to a Dense layer by replacing kernel_regularizer=None
with kernel_regularizer='L2'
when we create it. For example, the line below will create a Dense layer with 10 nodes (also called units) and L2 regularization enabled.
layer = tf.keras.layers.Dense(10, kernel_regularizer='L2')
If our neural network has 3 layers deep, then we need to add 'L2'
for each of the three layers to fully enable L2 regularization in the whole network.
Cheers,
Raymond