Why does keeping W values small correspond to less overfitting?

Struggling with the intuition for this one.

If the training set cost is minimized with a weight value of ‘w’, then if you make ‘w’ smaller, the fit isn’t going to be as good, and the cost will be higher.