If regularization makes our network act more like a linear function, can we achieve a similar effect by decreasing the number of units in our hidden layers manually? When would I choose one over the other? And why isn’t “smaller network” in the flow chart options for solutions to high variance?
It is an interesting question! Here’s another recent thread on the same topic.