Overfitting or Underfitting - what is the issue?

Hi there,

adding more layers can contribute to improve the performance and potentially reduce overfitting if you:

  • can learn more hierarchical structures that help to solve your problem
  • are successful to learn more abstract and complex behaviour (often the example is used in image processing: in the first layers you learn edges in your filters, in the next layers more complex shapes, finally in following layers you combine these shapes e.g. to describe and finally predict your classification class)

Assuming that your two models are really comparable with respect to data and chosen hyperparameters: I would have expected that your training performance would also get better in absolute terms. Could you provide your loss curves of train/dev set for both variants?

In general, I agree with you: increasing the model complexity with more parameters can increase the risk of overfitting. In the end it’s a trade-off to find a sweet spot between allowing the model to learn more complex & abstract patterns and not having too many model parameters, considering the available data.

Best regards
Christian