I’m wondering why do we split into train/test set before imputation.
Is there any downside when we do imputation before train/test split?
Yes, the downside is overfitting during training. This results in a system that only works well on the training set, but cannot make useful future predictions.
1 Like