This raised a question: Are there situations when target values (y) should be scaled too? Naturally, these would be scaled with different parameters from the training / cv / test set. Are there cases where this would be useful and what would those cases be?
@uhef I unfortunately don’t have access to the course/video you are speaking of.
But it got me thinking… ‘What is this guy speaking about ?’
I can only think you are referring to ‘normalization’, which is exactly what it sounds like-- You are flat leveling the values to a standard normalization.
And you’d have to transform y too (at least in my mind), to make sure you are on the same boat.
The important ‘exclusion of y’-- You can transform it. You just can’t ‘look at it’ (i.e. make decisions).
This isn’t generally needed, but doesn’t cause any difficulty if you want to. The primary effect is just reducing the magnitude of the cost, but it doesn’t impact the weights that give the minimum cost.
Yes, especially when you’re using a regularization term in the loss functions. It doesn’t guarantee great performance though, but can modify the bias-variance tradeoff. This happens especially in L1 regularization as scaling the output may change which variables are “selected”.