Thanks for the advice. I wasn’t paying attention to the loss at all, partly because I don’t know what a good loss value might be. I’m guessing when it gets so small that it’s displayed in scientific notation (e.g. 1.3167e-06)
I briefly tried a few variations. Here are my observations:
- Allowing the minimal model to train for more epochs can sometimes get me to this point (outcomes are somewhat random)
- Adding a convolution layer brings the loss down for the same number of epochs and similar accuracy, but it slows down training considerably
- Adding multiple convolution layers does not noticeably improve metrics, but it does slow down training even more.
- If, instead of convolution layers, I add a dense layer before the output layer, the loss comes down to very small numbers in very few epochs, and the training is fast
- If I increase the size of the dense layer, there’s a point where training converges more slowly
- If I use both a convolution layer and a dense layer, I can get maybe faster loss improvements, but slower training than the dense layer alone
It feels like a bit of a balancing act, trying to trade off model size and training speed. And a bigger model doesn’t seem to always mean faster convergence.