W4 Assignment - Success with Minimal Architecture

Thanks for the advice. I wasn’t paying attention to the loss at all, partly because I don’t know what a good loss value might be. I’m guessing when it gets so small that it’s displayed in scientific notation (e.g. 1.3167e-06) :stuck_out_tongue:

I briefly tried a few variations. Here are my observations:

  • Allowing the minimal model to train for more epochs can sometimes get me to this point (outcomes are somewhat random)
  • Adding a convolution layer brings the loss down for the same number of epochs and similar accuracy, but it slows down training considerably
  • Adding multiple convolution layers does not noticeably improve metrics, but it does slow down training even more.
  • If, instead of convolution layers, I add a dense layer before the output layer, the loss comes down to very small numbers in very few epochs, and the training is fast
  • If I increase the size of the dense layer, there’s a point where training converges more slowly
  • If I use both a convolution layer and a dense layer, I can get maybe faster loss improvements, but slower training than the dense layer alone

It feels like a bit of a balancing act, trying to trade off model size and training speed. And a bigger model doesn’t seem to always mean faster convergence.

1 Like