In a NN used to solve regression problems with multi outputs, is it common practice to normalise the outputs (y) as we do with the inputs (x)?
If the outputs dynamic range is very high, the NN will try to focus on the biggest one (or the biggest ones) and tend to “ignore” the small values.
Ones we have the predictions we re-add mu and multiply for sigma to get the true estimate.
Does it make sense?
The purpose of normalization is to bring data to a small range, say, [0, 1]. Normalizing both inputs and outputs is a good practice.
You can apply seperate normalizing schemes to different features. This way, the NN doesn’t have to worry about the actual difference in scales of the features.
To add another angle to the clear answer provided by @balaji.ambresh , in the case of some regression models, like for instance predicting the price of houses, I would probably not normalize the output as, again in the example of house prices, the target values would fall within close scales, say from $100K to $1MM, and not something like 0.01 for some cases and 1,000,000 for other cases.
Thanks for sharing your perspective, Juan. Scaling will still speed convergence of a NN (with the default learning rate), be it a classification or a regression problem with the $100K-$1MM output range. Please see this topic.
I absolutely agree with you @balaji.ambresh .
Thanks for the confirmation @balaji.ambresh and @Juan_Olano !