W4 Assignment - Success with Minimal Architecture

For the Week 4 assignment, I defined the simplest model possible: input > flatten > dense (with 1 neuron and sigmoid activation). My plan was to gradually build up from here to see what is the minimal architecture needed to solve the problem. To my surprise the initial model quickly converged to 99.9% accuracy. I tried multiple times and got the same result. The grader scored it as 100%.

Is this too good to be true, or are these examples just very easy to categorize? I thought I’d need to put some convolution layers in there, but I didn’t.

hi @MikeML

Your approach was right from going for simple model to going towards complex, so if you scored a perfect grader, don’t doubt.

This assignment did require a simpler model, as you go towards advanced specialisation course you will get to experiment complex models.

Also probably share a screenshot of final training read which would show accuracy.

So keep learning!!!

Regards
DP

I’ll take it! Here’s a screenshot of the final training which was completed in 6 epochs.

the task here was more pertaining to getting the desired accuracy which you seemed to achieve for this model, stating this doesn’t mean it is an ideal model as you see your loss is still in the range 0.04

You can try adding convolution layer and practice and resubmit, you will learn how the accuracy and loss gets affected by including a convolution layer

Regards
DP

1 Like

Thanks for the advice. I wasn’t paying attention to the loss at all, partly because I don’t know what a good loss value might be. I’m guessing when it gets so small that it’s displayed in scientific notation (e.g. 1.3167e-06) :stuck_out_tongue:

I briefly tried a few variations. Here are my observations:

  • Allowing the minimal model to train for more epochs can sometimes get me to this point (outcomes are somewhat random)
  • Adding a convolution layer brings the loss down for the same number of epochs and similar accuracy, but it slows down training considerably
  • Adding multiple convolution layers does not noticeably improve metrics, but it does slow down training even more.
  • If, instead of convolution layers, I add a dense layer before the output layer, the loss comes down to very small numbers in very few epochs, and the training is fast
  • If I increase the size of the dense layer, there’s a point where training converges more slowly
  • If I use both a convolution layer and a dense layer, I can get maybe faster loss improvements, but slower training than the dense layer alone

It feels like a bit of a balancing act, trying to trade off model size and training speed. And a bigger model doesn’t seem to always mean faster convergence.

1 Like

Great so you did learn some important points. Being said although you did understand significance of each layer being add, how affects the loss or training speed, one should not forget about the dataset being used here was much simpler than a complex data or more precisely has two significant features. So when it comes to complex data with more than 2 features or probably confounding features relating to a data, that is where a simpler model might not get the result you are looking for.

Your idea of achieving a balance between training accuracy and loss is ofcourse important but that comes with the basic understanding of what kind of data one is handling.

Also just to point on your addition of dense layer giving you better results because of its significance to connect each hidden units when comes a neural network.

I hope you have completed Deep Learning Specialisation.

Being inquisitive a good to step to understand something more better.

Keep Learning!!!
Regards
DP

1 Like

Noted. Yes, I’ve completed the Deep Learning Specialization. Taking the TF course to get more hands-on now. Thanks for the help.

1 Like

you will not regret surely.

I don’t if you learning out of interest but if you want to understand more complex models, then tensorflow advanced technique specialisation and NLP are good mind-benders.

Good luck

Regards
DP