Hi there. I am learning Multiple Linear Regression and feature scaling.
I can understand that it is necessary to normalize input_features, when features have different scales and are updated unevenly.
I am wondering should we normalize the target value during training.
From my perspective, I think target value needs to normalize. Because, the linear model is like f(x) = wx + b, and the cost is like J = [f(x) - y]^2, after x is rescaling to -1 ~ 1 and y still is very large, f(x) might be extremely small, this may make the cost be very very large. All in all, if we don’t scale target, feature and target are not in same scale. It looks quite strange.
But, in the coding example. I found only the features are normalized. This makes me quite confused.
The bias term b levels up f(x) to match with the mean level of y. On the other hand, even though x is ranged between -1 and 1, w can stretch x to a larger range to match with the variance of y.
Thanks, @rmwkwok, now I can understand that the model can work without normalizing the targets. However, do you think normalizing target can somewhat accelerate the training?
Usually we initialize w to be around between -2 and 2, and b to be 0. If the optimal w and b are in these ranges, ofcourse, you need less steps to converge, when comparing to the case that the optimal parameters are far away because they need to stretch a lot and level up a lot. So, if your y is normalized, then the optimal paramters will be closer to those ranges and therefore less steps!
However, we always only talk about normaling features because sometimes it really can give you a hard time choosing a good learning rate for it to converge. However, after you normalized features, learning rate will be pretty easy to choose, and even if you don’t normalize y, it still can converge, perhaps it would just take you an extra few or a few tens more iterations compared to normalized y. However, that kind of iterations is nothing to a computer.