For the Week 4 assignment, I defined the simplest model possible: input > flatten > dense (with 1 neuron and sigmoid activation). My plan was to gradually build up from here to see what is the minimal architecture needed to solve the problem. To my surprise the initial model quickly converged to 99.9% accuracy. I tried multiple times and got the same result. The grader scored it as 100%.
Is this too good to be true, or are these examples just very easy to categorize? I thought I’d need to put some convolution layers in there, but I didn’t.
the task here was more pertaining to getting the desired accuracy which you seemed to achieve for this model, stating this doesn’t mean it is an ideal model as you see your loss is still in the range 0.04
You can try adding convolution layer and practice and resubmit, you will learn how the accuracy and loss gets affected by including a convolution layer
Thanks for the advice. I wasn’t paying attention to the loss at all, partly because I don’t know what a good loss value might be. I’m guessing when it gets so small that it’s displayed in scientific notation (e.g. 1.3167e-06)
I briefly tried a few variations. Here are my observations:
Allowing the minimal model to train for more epochs can sometimes get me to this point (outcomes are somewhat random)
Adding a convolution layer brings the loss down for the same number of epochs and similar accuracy, but it slows down training considerably
Adding multiple convolution layers does not noticeably improve metrics, but it does slow down training even more.
If, instead of convolution layers, I add a dense layer before the output layer, the loss comes down to very small numbers in very few epochs, and the training is fast
If I increase the size of the dense layer, there’s a point where training converges more slowly
If I use both a convolution layer and a dense layer, I can get maybe faster loss improvements, but slower training than the dense layer alone
It feels like a bit of a balancing act, trying to trade off model size and training speed. And a bigger model doesn’t seem to always mean faster convergence.
Great so you did learn some important points. Being said although you did understand significance of each layer being add, how affects the loss or training speed, one should not forget about the dataset being used here was much simpler than a complex data or more precisely has two significant features. So when it comes to complex data with more than 2 features or probably confounding features relating to a data, that is where a simpler model might not get the result you are looking for.
Your idea of achieving a balance between training accuracy and loss is ofcourse important but that comes with the basic understanding of what kind of data one is handling.
Also just to point on your addition of dense layer giving you better results because of its significance to connect each hidden units when comes a neural network.
I hope you have completed Deep Learning Specialisation.
Being inquisitive a good to step to understand something more better.
I don’t if you learning out of interest but if you want to understand more complex models, then tensorflow advanced technique specialisation and NLP are good mind-benders.