If we add more training data, wouldn’t the model fit better to the training sets (thus overfit and less generalizable)? So why would we add more training data to prevent overfitting? Shouldn’t we add more testing data instead?
It should be the opposite. Adding more training data but keeping the same model architecture would make your model harder to fit well to every training data point. It’s like you can cater for the different dinner perferences for 10 kids, but if you have 100 kids, then it is going to be more challenging. The more data points you get, the more compromise your model will have to make to fit less well to the original points in order to make room for fitting better to the new points.
Adding more training data creates more variance. This helps mitigate against overfitting.