Course 3 Week1: put it into practice, cat classification

I am creating this post for my learning and putting my training a deep neural network into practice.

the task I use is the cat classification from Course 1.

I’ve just made a n-layer neural network to test it out.

here is the result:

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 72 % (percentage of correctly labelled datapoints)

Hyperparameters: {'learning_rate': 0.0007, 'hidden_sizes': [128, 64], 'dropout_rate': 0.25}

Training Accuracy: 100.00%
Testing Accuracy: 58.00%

the above is the quick and dirty results (using my intuition)
so my neural network even performed worse than logistic regression

thus the goal is probably needed to make the model at least perform better than logistic regression.

from the above, I know it is due to overfitting

as training accuracy is 100% but testing accuracy is 58%.

My tools are dropout and L2 regularisation.

I didn’t use any regularisation at all for 58% testing accuracy

I don’t know if I am correct, I feel that L2 regularisation often kinda over-regularisation somehow

so I just tune dropout rate to 0.2 and see how it goes.

The process I am doing is just manual tuning the knobs as told by Andrew during videos, and try to turn one at a time.

l2 = 0.0, dropout= 0.2
Epoch [1/3000], Loss: 0.87964642
Epoch [1001/3000], Loss: 0.00514520
Epoch [2001/3000], Loss: 0.00174722

Finished Training
Training Accuracy: 100.00%
Testing Accuracy: 72.00%

same as logistic regression

changed dropout rate = 0.15
Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 72 % (percentage of correctly labelled datapoints)

Epoch [1/3000], Loss: 0.65881771
Epoch [1001/3000], Loss: 0.00089703
Epoch [2001/3000], Loss: 0.00281329

Training Accuracy: 100.00%
Testing Accuracy: 76.00%

so now the testing accuracy is 76%, better than logistic regression.
but still worse than human level (I think at least 90%)
so I don’t change the dropout this time.
but doing the data augmentation.
to make the 209 training data to 418 training with random rotating the images.

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch [1/3000], Loss: 0.59803784
Epoch [1001/3000], Loss: 0.00353110
Epoch [2001/3000], Loss: 0.00033148

Training Accuracy: 100.00%
Testing Accuracy: 72.00%

with data augmentation, it didn’t increase test accuracy
but even logistic regression dropped from 72% to 68%
and my best 76% dropped to 72%

now I think I might need a bigger network.
the current layers_dims = [12288, 128, 64, 1]

probably add a few more layers hopefully it can learn in more complex way.
if anyone has more ideas, please let me know.

my goal is to make it having at least 80-90% testing accuracy, which is approaching human level (I guess).
the tools I have are (as mentioned in week1 of course 3) :
to reduce variance (big diff between training error and test error)

  • regularization (L2/dropout/data augmentation)
  • get more training data
  • NN architureure/hyperparmeters search

I am not using Conv Layers yet, as I haven’t learnt Course 4 yet.

1 Like

here are some more updates:
I just changed the learning rate from 0.0007 (value from programming assignment) to 0.001.

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch [1/3000], Loss: 0.61563933
Epoch [1001/3000], Loss: 0.00004478
Epoch [2001/3000], Loss: 0.00000050

Training Accuracy: 100.00%
Testing Accuracy: 78.00%

that is the best so far! 78% even with augmented images

but if I did learning rate further say to 0.0015
it dropped back to :

Epoch [1/3000], Loss: 0.58826029
Epoch [1001/3000], Loss: 0.00000147
Epoch [2001/3000], Loss: 0.00000236
Training Accuracy: 100.00%
Testing Accuracy: 76.00%

so I think it is really time to think of a more complex architecture
maybe 4 layers one.
I will update it later.
the one in my mind is [12288, 256, 128, 64, 1] by increasing one layer.
I am open to add more layers.
the goal is still having >80-90% testing accuracy.

1 Like

Nice work! You’re using a good process, thanks for your updates.

You are correct that you’re overfitting the training set.

To me this means your model is already too complex. Making it more complex by adding additional layers may just make the overfitting worse.

Since this is a small data set, it might be useful for you to visually inspect the images that are being labeled incorrectly, and see if you can spot any trends.

1 Like

Note that data augmentation is not a way of doing regularization. It’s not correct to list it as a similar technique like dropout and L2.

Augmentation is a way of increasing the size and variance of the data set.

some update:
I did add a few more layers
a. [12288, 512, 256, 128, 64, 1]

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch [1/3000], Loss: 0.52063584
Epoch [1001/3000], Loss: 0.00278873
Epoch [2001/3000], Loss: 0.00045550
Training Accuracy: 100.00%
Testing Accuracy: 76.00%

and then as more layers added, it took a long time to train.
so I added tqdm to know how far it is from finishing training

[12288, 1024, 512, 256, 128, 64, 1]
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch 1 - Train Loss: 0.8491999166, Accuracy: 60.5263%                                                    
Epoch 100 - Train Loss: 0.0002256598, Accuracy: 100.0000%                                                 
Epoch 200 - Train Loss: 0.3624637829, Accuracy: 87.0813%                                                  
Epoch 300 - Train Loss: 0.0000933033, Accuracy: 100.0000%                                                 
Epoch 400 - Train Loss: 0.0000985829, Accuracy: 100.0000%                                                 
Epoch 500 - Train Loss: 0.0003825736, Accuracy: 100.0000%                                                 
Epoch 600 - Train Loss: 0.0000152957, Accuracy: 100.0000%                                                 
Epoch 700 - Train Loss: 0.0005913381, Accuracy: 100.0000%                                                 
Epoch 800 - Train Loss: 0.0000766379, Accuracy: 100.0000%                                                 
Epoch 900 - Train Loss: 0.0000054904, Accuracy: 100.0000%                                                 
Epoch 1000 - Train Loss: 0.0000872398, Accuracy: 100.0000%                                                
Epoch 1100 - Train Loss: 0.0003413693, Accuracy: 100.0000%                                                
Epoch 1200 - Train Loss: 0.0000527607, Accuracy: 100.0000%                                                
Epoch 1300 - Train Loss: 0.0000172168, Accuracy: 100.0000%                                                
Epoch 1400 - Train Loss: 0.0000070802, Accuracy: 100.0000%                                                
Epoch 1500 - Train Loss: 0.0000172428, Accuracy: 100.0000%                                                
Epoch 1600 - Train Loss: 0.0007219240, Accuracy: 100.0000%                                                
Epoch 1700 - Train Loss: 0.0000375199, Accuracy: 100.0000%                                                
Epoch 1800 - Train Loss: 0.0000577942, Accuracy: 100.0000%                                                
Epoch 1900 - Train Loss: 0.0000256875, Accuracy: 100.0000%                                                
Epoch 2000 - Train Loss: 0.0000092817, Accuracy: 100.0000%                                                
Epoch 2100 - Train Loss: 0.0000197066, Accuracy: 100.0000%                                                
Epoch 2200 - Train Loss: 0.0000036050, Accuracy: 100.0000%                                                
Epoch 2300 - Train Loss: 0.0000386271, Accuracy: 100.0000%                                                
Epoch 2400 - Train Loss: 0.0000044630, Accuracy: 100.0000%                                                
Epoch 2500 - Train Loss: 0.0000087664, Accuracy: 100.0000%                                                
Epoch 2600 - Train Loss: 0.0000189399, Accuracy: 100.0000%                                                
Epoch 2700 - Train Loss: 0.0000061183, Accuracy: 100.0000%                                                
Epoch 2800 - Train Loss: 0.0000207307, Accuracy: 100.0000%                                                
Epoch 2900 - Train Loss: 0.0000105827, Accuracy: 100.0000%                                                
Epoch 3000 - Train Loss: 0.0000024134, Accuracy: 100.0000%                                                
Training Accuracy: 100.00%
Testing Accuracy: 80.00%

this one is the best so far. 80% accuracy.
kinda at the low side of my goal. (80-90% accuracy)

I did try even deeper network, but result was not improving.
[12288, 2048, 1024, 512, 256, 128, 64, 1]

Epoch 2400 - Train Loss: 0.0000022641, Accuracy: 100.0000%                                                
Epoch 2500 - Train Loss: 0.0000113922, Accuracy: 100.0000%                                                
Epoch 2600 - Train Loss: 0.0000125151, Accuracy: 100.0000%                                                
Epoch 2700 - Train Loss: 0.0000141782, Accuracy: 100.0000%                                                
Epoch 2800 - Train Loss: 0.0000091262, Accuracy: 100.0000%                                                
Epoch 2900 - Train Loss: 0.0000497373, Accuracy: 100.0000%                                                
Epoch 3000 - Train Loss: 0.0000031201, Accuracy: 100.0000%                                                
Training Accuracy: 100.00%
Testing Accuracy: 72.00%

so I think I will pause at 80% accuracy for now.
will come back after I learnt convolutional layers.

here are the hyperparameters for achieving 80% accuracy:
a. learning_rate = lr=0.001
b. dropout_rate = 0.025 (turns out that I did change that from 0.25 to 0.025 and forgot to change back on subsequent training :blush:)
c. layer_dims = [12288, 1024, 512, 256, 128, 64, 1]
d. data augmented to 418 images.
e. relu activation, sigmoid at the output layer
f. did batch normalisation on every hidden layers
g. No L1/L2 regularisation

Thank TMosh for the advice and correcting my misunderstanding.
I will try doing the error analysis and see if anything can be improved later (maybe mislabeling) .

I also have a sense that on making a deeper network architecture (6 layers), I might need to perform some hyperparameter tuning (at least on learning rate), maybe having a tiny increase in accuracy. I don’t know if I am correct, just my feeling.
I may do a quick try on leaky relu and see if any small improvement.
those are on my plan.
but I will pause for now and go ahead for Course 4 (Convolutional Neural Network)

1 Like

Notice that on the last few training runs you show, you have train accuracy 100% the whole time. So as Tom suggested earlier, this really looks like a serious overfitting problem. Maybe “early stopping” would be helpful. One way to get a better picture of what is happening would be to print the test accuracy when you print the train accuracy. The question is whether you can find an earlier stopping point beyond which the test accuracy gets worse, rather than better.

Of course this is just one possibility: it could well be that the test accuracy is monotonically increasing, but just at a very slow rate. But the point is that in terms of deciding which way to go to improve things, it would be useful to understand what is actually happening in that regard.

One more detail you can see is that the training loss is not monotonically decreasing. So that suggests that you are not doing yourself any favors by running so many training iterations: your “convergence” is just oscillating rather than converging. More analysis is necessary and one first step would be the instrumentation I suggested above. Other thoughts?

2 Likes

As I said before, I expect this will only make the problem worse.

I’ve added the testing loss and testing accuracy on every 100 epochs and here it is:

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch 1 - Train Loss: 0.8968982611, Accuracy: 63.1579%, Test Loss: 4.7183909416, Test Accuracy: 38.0000%                    
Epoch 100 - Train Loss: 0.0001124884, Accuracy: 100.0000%, Test Loss: 1.1605676413, Test Accuracy: 76.0000%                 
Epoch 200 - Train Loss: 0.0000557198, Accuracy: 100.0000%, Test Loss: 1.1754144430, Test Accuracy: 76.0000%                 
Epoch 300 - Train Loss: 0.0008092210, Accuracy: 100.0000%, Test Loss: 1.1046861410, Test Accuracy: 80.0000%                 
Epoch 400 - Train Loss: 0.0000766339, Accuracy: 100.0000%, Test Loss: 1.2684947252, Test Accuracy: 78.0000%                 
Epoch 500 - Train Loss: 0.0000233788, Accuracy: 100.0000%, Test Loss: 1.3618611097, Test Accuracy: 78.0000%                 
Epoch 600 - Train Loss: 0.0000969937, Accuracy: 100.0000%, Test Loss: 1.1407402754, Test Accuracy: 82.0000%                 
Epoch 700 - Train Loss: 0.0002704134, Accuracy: 100.0000%, Test Loss: 1.3261145353, Test Accuracy: 78.0000%                 
Epoch 800 - Train Loss: 0.0000817834, Accuracy: 100.0000%, Test Loss: 1.4247012138, Test Accuracy: 78.0000%                 
Epoch 900 - Train Loss: 0.0000119775, Accuracy: 100.0000%, Test Loss: 1.6318467855, Test Accuracy: 74.0000%                 
Epoch 1000 - Train Loss: 0.0000045916, Accuracy: 100.0000%, Test Loss: 1.7001992464, Test Accuracy: 74.0000%                
Epoch 1100 - Train Loss: 0.0000033251, Accuracy: 100.0000%, Test Loss: 1.6847548485, Test Accuracy: 78.0000%                
Epoch 1200 - Train Loss: 0.0000119308, Accuracy: 100.0000%, Test Loss: 1.9347778559, Test Accuracy: 74.0000%                
Epoch 1300 - Train Loss: 0.0000367957, Accuracy: 100.0000%, Test Loss: 2.1016838551, Test Accuracy: 72.0000%                
Epoch 1400 - Train Loss: 0.0000317135, Accuracy: 100.0000%, Test Loss: 1.9311181307, Test Accuracy: 74.0000%                
Epoch 1500 - Train Loss: 0.0000208138, Accuracy: 100.0000%, Test Loss: 2.1626112461, Test Accuracy: 72.0000%                
Epoch 1600 - Train Loss: 0.0000047019, Accuracy: 100.0000%, Test Loss: 2.3753902912, Test Accuracy: 68.0000%                
Epoch 1700 - Train Loss: 0.0000010251, Accuracy: 100.0000%, Test Loss: 2.2693607807, Test Accuracy: 68.0000%                
Epoch 1800 - Train Loss: 0.0000031176, Accuracy: 100.0000%, Test Loss: 2.2756483555, Test Accuracy: 76.0000%                
Epoch 1900 - Train Loss: 0.0000491930, Accuracy: 100.0000%, Test Loss: 2.1158134937, Test Accuracy: 74.0000%                
Epoch 2000 - Train Loss: 0.0000457143, Accuracy: 100.0000%, Test Loss: 2.0775973797, Test Accuracy: 82.0000%                
Epoch 2100 - Train Loss: 0.0000406990, Accuracy: 100.0000%, Test Loss: 2.7636830807, Test Accuracy: 72.0000%                
Epoch 2200 - Train Loss: 0.0000096822, Accuracy: 100.0000%, Test Loss: 2.3964231014, Test Accuracy: 78.0000%                
Epoch 2300 - Train Loss: 0.0000133773, Accuracy: 100.0000%, Test Loss: 2.3748342991, Test Accuracy: 74.0000%                
Epoch 2400 - Train Loss: 0.0000030144, Accuracy: 100.0000%, Test Loss: 2.1774208546, Test Accuracy: 76.0000%                
Epoch 2500 - Train Loss: 0.0000336828, Accuracy: 100.0000%, Test Loss: 2.1827614307, Test Accuracy: 82.0000%                
Epoch 2600 - Train Loss: 0.0025016823, Accuracy: 100.0000%, Test Loss: 3.0822870731, Test Accuracy: 78.0000%                
Epoch 2700 - Train Loss: 0.0000028315, Accuracy: 100.0000%, Test Loss: 2.9604318142, Test Accuracy: 70.0000%                
Epoch 2800 - Train Loss: 0.0003228166, Accuracy: 100.0000%, Test Loss: 1.9306467772, Test Accuracy: 82.0000%                
Epoch 2900 - Train Loss: 0.0000217696, Accuracy: 100.0000%, Test Loss: 2.2150287628, Test Accuracy: 76.0000%                
Epoch 3000 - Train Loss: 0.0000025787, Accuracy: 100.0000%, Test Loss: 2.1785304546, Test Accuracy: 78.0000%  

here is the loss curve

those curves were generated using code in Course 4 Week 1 programming exercise. it is a lot difference from the programming exercise results.

Obviously it is overfitting.
I will definitely need to do some more regularisation to bring the test error down.

I will try dropout/L2 regularisation first.
and maybe go back to a smaller model as well.

1 Like

The evidence suggests that “early stopping” after Epoch 600 would be your best solution at least with your other current hyperparameter choices. All the training after that is just a waste of time: it does not improve the training accuracy and the test accuracy only gets worse after that point.

1 Like

I agree with you, Paul, as such, I changed the number of epoch to 1200 (double of 600 to help to capture the training trend better)
I did some more on adjusting dropout rate.
Not helping at all. they were all over-regularisation.
and I couldn’t recreate the 82% accuracy, as I didn’t set the seed beforehand, it had been using random seed.

here are some results on using seed = 1
dropout rate = 0.025:

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch 1 - Train Loss: 0.8060707705, Accuracy: 64.5933%, Test Loss: 7.1936964989, Test Accuracy: 38.0000%                    
Epoch 100 - Train Loss: 0.0001622115, Accuracy: 100.0000%, Test Loss: 1.8052763939, Test Accuracy: 78.0000%                 
Epoch 200 - Train Loss: 0.0000605114, Accuracy: 100.0000%, Test Loss: 2.1498451233, Test Accuracy: 74.0000%                 
Epoch 300 - Train Loss: 0.0000346626, Accuracy: 100.0000%, Test Loss: 2.4566383362, Test Accuracy: 76.0000%                 
Epoch 400 - Train Loss: 0.0003263854, Accuracy: 100.0000%, Test Loss: 1.9640132189, Test Accuracy: 76.0000%                 
Epoch 500 - Train Loss: 0.0001644453, Accuracy: 100.0000%, Test Loss: 1.9585602283, Test Accuracy: 72.0000%                 
Epoch 600 - Train Loss: 0.0000416066, Accuracy: 100.0000%, Test Loss: 2.2795214653, Test Accuracy: 74.0000%                 
Epoch 700 - Train Loss: 0.0000793575, Accuracy: 100.0000%, Test Loss: 2.7676911354, Test Accuracy: 76.0000%                 
Epoch 800 - Train Loss: 0.0000377860, Accuracy: 100.0000%, Test Loss: 2.9661905766, Test Accuracy: 74.0000%                 
Epoch 900 - Train Loss: 0.0003734164, Accuracy: 100.0000%, Test Loss: 2.5763833523, Test Accuracy: 68.0000%                 
Epoch 1000 - Train Loss: 0.0000122600, Accuracy: 100.0000%, Test Loss: 3.1165988445, Test Accuracy: 74.0000%                
Epoch 1100 - Train Loss: 0.0057456216, Accuracy: 99.7608%, Test Loss: 2.8850488663, Test Accuracy: 72.0000%                 
Epoch 1200 - Train Loss: 0.0000832415, Accuracy: 100.0000%, Test Loss: 3.0044536591, Test Accuracy: 76.0000%                
Training Accuracy: 100.00%
Testing Accuracy: 76.00%

dropout rate = 0.05:

train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch 1 - Train Loss: 0.7966793052, Accuracy: 63.6364%, Test Loss: 5.8648667336, Test Accuracy: 42.0000%                    
Epoch 100 - Train Loss: 0.0002019828, Accuracy: 100.0000%, Test Loss: 1.7692965269, Test Accuracy: 76.0000%                 
Epoch 200 - Train Loss: 0.0000567369, Accuracy: 100.0000%, Test Loss: 2.0199258327, Test Accuracy: 74.0000%                 
Epoch 300 - Train Loss: 0.0016451071, Accuracy: 100.0000%, Test Loss: 1.7323887348, Test Accuracy: 76.0000%                 
Epoch 400 - Train Loss: 0.0001727938, Accuracy: 100.0000%, Test Loss: 1.7279729843, Test Accuracy: 80.0000%                 
Epoch 500 - Train Loss: 0.0001855322, Accuracy: 100.0000%, Test Loss: 2.0534861088, Test Accuracy: 74.0000%                 
Epoch 600 - Train Loss: 0.0000204563, Accuracy: 100.0000%, Test Loss: 2.1191828251, Test Accuracy: 74.0000%                 
Epoch 700 - Train Loss: 0.0001647298, Accuracy: 100.0000%, Test Loss: 2.9372606277, Test Accuracy: 72.0000%                 
Epoch 800 - Train Loss: 0.0000831830, Accuracy: 100.0000%, Test Loss: 3.1068224907, Test Accuracy: 74.0000%                 
Epoch 900 - Train Loss: 0.0005263959, Accuracy: 100.0000%, Test Loss: 2.6927649975, Test Accuracy: 74.0000%                 
Epoch 1000 - Train Loss: 0.0000079957, Accuracy: 100.0000%, Test Loss: 3.0232591629, Test Accuracy: 72.0000%                
Epoch 1100 - Train Loss: 0.0004276456, Accuracy: 100.0000%, Test Loss: 3.4267504215, Test Accuracy: 72.0000%                
Epoch 1200 - Train Loss: 0.0000133139, Accuracy: 100.0000%, Test Loss: 3.4840440750, Test Accuracy: 72.0000%                
Training Accuracy: 100.00%
Testing Accuracy: 72.00%

dropout rate = 0.1:

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch 1 - Train Loss: 0.7982902229, Accuracy: 62.6794%, Test Loss: 6.3382663727, Test Accuracy: 44.0000%                    
Epoch 100 - Train Loss: 0.0003898116, Accuracy: 100.0000%, Test Loss: 1.6615906954, Test Accuracy: 78.0000%                 
Epoch 200 - Train Loss: 0.0000653922, Accuracy: 100.0000%, Test Loss: 2.1499385834, Test Accuracy: 76.0000%                 
Epoch 300 - Train Loss: 0.0000186972, Accuracy: 100.0000%, Test Loss: 2.3755857944, Test Accuracy: 74.0000%                 
Epoch 400 - Train Loss: 0.0000144449, Accuracy: 100.0000%, Test Loss: 2.5581126213, Test Accuracy: 74.0000%                 
Epoch 500 - Train Loss: 0.0003287637, Accuracy: 100.0000%, Test Loss: 1.7797993422, Test Accuracy: 78.0000%                 
Epoch 600 - Train Loss: 0.0002240057, Accuracy: 100.0000%, Test Loss: 2.2652101517, Test Accuracy: 72.0000%                 
Epoch 700 - Train Loss: 0.0001376531, Accuracy: 100.0000%, Test Loss: 2.0998404026, Test Accuracy: 78.0000%                 
Epoch 800 - Train Loss: 0.0002548182, Accuracy: 100.0000%, Test Loss: 2.9168941975, Test Accuracy: 72.0000%                 
Epoch 900 - Train Loss: 0.0000673367, Accuracy: 100.0000%, Test Loss: 2.8092284203, Test Accuracy: 72.0000%                 
Epoch 1000 - Train Loss: 0.0000050690, Accuracy: 100.0000%, Test Loss: 3.0432934761, Test Accuracy: 70.0000%                
Epoch 1100 - Train Loss: 0.0030618873, Accuracy: 99.7608%, Test Loss: 2.5695786476, Test Accuracy: 68.0000%                 
Epoch 1200 - Train Loss: 0.0000553706, Accuracy: 100.0000%, Test Loss: 2.7377741337, Test Accuracy: 70.0000%                
Training Accuracy: 100.00%
Testing Accuracy: 70.00%

so I think I am done with regularisation and that probably the optimised setting for dropout layer (dropout rate = 0.025).

I might need to perform some hyperparameter tuning (at least on learning rate and also decaying schedule ), maybe having a tiny increase in accuracy.
I may do a quick try on leaky relu and see if any small improvement.
those are on my plan.

A quick question, it is my first time to have a hand-on to train the network and adjusting all the knobs. (for all programming exercises, they have set the hyperparameters in the first place)
is it normal the training error and testing error different by that much for a cat classification type of problem? as it seems to me I can’t bring down the testing error to < 0.5 (currently best is 1.16)
while the training error can approach to zero (0.0001124884).

on one hand, when you see that training results
it is sensible to tell it is overfitting,
but at the same time, you can’t do much using dropout/L2 regularisation to bring the testing error down.
what else it can be done? or just choosing the 82% model and call it a day? or there is something I have missed out the whole time?
sorry, I am kinda confused on the workflow now.

1 Like

I think maybe you don’t understand what regularization (which Dropout is a form of) does.

It increases the cost on the training set, to the benefit of getting lower cost on the test set.

Every result you’ve shown shows that you’re still getting 100% accuracy on the training set.

So you’re still overfitting.

I’d like to see some results where you get lower training set accuracy. Until that happens, you aren’t using enough regularization.

1 Like

It’s great that you are trying all these experiments. We always learn something when we try to take the course material and apply or extend it.

My interpretation of your results is that dropout = 0.05 gave the best performance. You got to 80% test accuracy on Epoch 400 there. Also notice that what happens after that is less “bouncy”, but it’s all in the low 72 - 74% range so maybe that is irrelevant.

One other general comment: looking at the loss values is not really all that useful. Any particular number for loss doesn’t really mean anything in the sense that if you tell me the loss value I can’t make any conclusion from that. It’s only useful to know if it’s going up or down, as a proxy for whether convergence is working or not. The accuracy is the Gold Standard or “sine qua non” here. Those are the numbers that really matter for assessing the performance. Well, that and the compute cost it took you to get there. :grinning:

The other general thing to say here is that this dataset is ridiculously small for a problem this complex. That’s really a big limitation of investing too much effort in this particular set of experiments. Here’s a thread from a while back, where I did some experiments perturbing the mix of “yes” and “no” samples between train and test sets.

It might be a better idea to find a richer and more realistic dataset. One I’ve heard of, but haven’t personally used, is the Kaggle Cats and Dogs Challenge. That has O(10^4) samples. Of course that means that all the training will be a lot more expensive, but at least you’ll have a better chance of learning generalizable things.

1 Like

As you proceed with DLS Course 2 and Course 3, you’ll hear a lot more about strategies for dealing with cases in which the performance does not achieve your requirements. And in the case of overfitting, the number one choice to pursue for that is “get more training data”. :laughing: Of course in “real world” scenarios where you may already have a lot more than 209 + 50 samples, that may not be an easy or inexpensive thing to do.

But here you do have alternatives ready to hand, e.g. the Kaggle dataset. ImageNet is also a rich source of image datasets.

1 Like

I did some more on dropout rate.
this one can brought down a lot in testing error:
dropout rate = 0.95:

Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch 1 - Train Loss: 1.7864180803, Accuracy: 53.1100%, Test Loss: 1.5218288898, Test Accuracy: 66.0000%                    
Epoch 100 - Train Loss: 0.6086005483, Accuracy: 65.7895%, Test Loss: 0.6494225860, Test Accuracy: 74.0000%                  
Epoch 200 - Train Loss: 0.4338078499, Accuracy: 77.0335%, Test Loss: 0.6187242866, Test Accuracy: 74.0000%                  
Epoch 300 - Train Loss: 0.3211718010, Accuracy: 85.6459%, Test Loss: 0.6154537797, Test Accuracy: 76.0000%                  
Epoch 400 - Train Loss: 0.2389779091, Accuracy: 90.1914%, Test Loss: 0.7578060627, Test Accuracy: 78.0000%                  
Epoch 500 - Train Loss: 0.2155333619, Accuracy: 93.5407%, Test Loss: 0.8939739466, Test Accuracy: 72.0000%                  
Epoch 600 - Train Loss: 0.1299183252, Accuracy: 93.7799%, Test Loss: 0.9161641002, Test Accuracy: 76.0000%                  
Epoch 700 - Train Loss: 0.1487568755, Accuracy: 93.7799%, Test Loss: 1.1355643272, Test Accuracy: 74.0000%                  
Epoch 800 - Train Loss: 0.1511787927, Accuracy: 94.2584%, Test Loss: 1.2522274256, Test Accuracy: 76.0000%                  
Epoch 900 - Train Loss: 0.0658203305, Accuracy: 97.1292%, Test Loss: 1.3832201958, Test Accuracy: 72.0000%                  
Epoch 1000 - Train Loss: 0.0790953503, Accuracy: 96.8900%, Test Loss: 1.6637328863, Test Accuracy: 72.0000%                 
Epoch 1100 - Train Loss: 0.0777927211, Accuracy: 96.6507%, Test Loss: 1.5496571064, Test Accuracy: 72.0000%                 
Epoch 1200 - Train Loss: 0.0873311409, Accuracy: 96.8900%, Test Loss: 1.7269042730, Test Accuracy: 72.0000%                 
Training Accuracy: 99.28%
Testing Accuracy: 72.00%

here are the plots:
a. training/testing error:

b. training/testing accuracy:

so I think I will choose the one on epoch 400 with testing accuracy of 78%
(in plots, I logged on every 10 epochs)

this model having testing error (0.7578060627 at epoch 400) closer to training error compared to previous models (1.16ish)

I think the simple model (this one) should generalise better, although it has less accuracy (78%) compared to my personal best (80-82% in overfitted model)

I know it is a very small dataset and it might even not perform well on any unseen data.
based on your experience, which one you will choose? (78% or 80-82% model)?
As I worked on the programming exercise in Course 4 Week 2 about training the ResNet50 with pre-trained weights on hand sign dataset, it has > 95% accuracy, but it can’t recognise my own hand sign image.
It makes me think that we might need to accept a poorer accuracy model but it is probably able to generalise better.
what are your thoughts, am I in the right direction?

My original intention of this post is to find a highest accuracy model in this simple cat classification example using fully connected NN, then CNN, ResNet and so on. and really want to check whether CNN/ResNet can outperform the fully connected NN.

Now the “highest” accuracy seems not a good metrics/objective to measure the real performance to me.

P.S. I will definitely try the Kaggle one later, probably after Course 4.
and your thread has confirmed my understanding :pray:.
also, I learnt more about error analysis and noticed that the imbalance datasets in train and test data. I had assumed they were in the same distribution.

1 Like

Consider what dropout = 0.95 is doing.

95% of the units in the model are being ignored during training. Only 1 out of every 20 is being used.

If you do that and still get reasonable results, it means your model is way too complicated for the job.

1 Like

just want to double confirm that data augmentation, as far as I know, is to make the model generalises better. Your statement is kinda different to Andrew said in video.

he also put data augmentation as part of regularisation, as in the screen cap above.

Augmentation makes the data set larger, without you having to create and label all new examples.

Having a larger data set means you might need less regularization. “More data” is one of the ways to help avoid overfitting.

Thanks Paul for your advice, so I moved on to use Kaggle dataset (Cats Vs Dogs).
It has 25000 training data with 12500 testing data.
in my case, I just use 25000 training data and split it up into 20000 for training set and 5000 for dev set.

I quickly used logistic regression and here is the results for comparison later:

Logistic Regression:
Training: 67%
Testing (dev): 61%

this time, I changed a bit. I use random search for the kinda better hyperparameters as a starting point.

here are the hyperparameters for random search:

param_grid = {
    'learning_rate': list(np.logspace(-4, 0, num=5)),
    'hidden_sizes': [random_hidden_layers() for _ in range(10)],
    'dropout_rate': list(np.arange(0.05, 0.61, 0.05)),
    'l2_regularization_strength': list(np.logspace(-4, 0, num=5)),
    'num_epochs': list(np.arange(100, 501, 50))
}

and the results for “best” hyperparameter setting:

Best Hyperparameters: {'learning_rate': 0.0001, 'hidden_sizes': [305, 417], 'dropout_rate': 0.4, 'l2_regularization_strength': 0.0001, 'num_epochs': 300}
Best Test Accuracy: 66.38%

so it is already better than logistic regression.
however, it is worse than the best in kaggle with >98% accuracy (I think that is probably very close to human level).

First, I tried data augmentation:
this time, I have increased 2 x data augmentation
meaning the total training data is 20000 (original) + 40000 (data augmented) = 60000
and keeping dev set. still 5000

Training Accuracy: 98.30%
Testing Accuracy: 69.40%

and I pushed it to the limit
data augmented for 6 times, that is the limit I can fit all into the memory
here is the results:

train_X's shape: (140000, 30000)
train_Y's shape: (140000, 1)
Epoch 1 - Train Loss: 0.7080129947, Accuracy: 58.5586%, Test Loss: 0.6237791777, Test Accuracy: 64.5800%
Epoch 10 - Train Loss: 0.5802261102, Accuracy: 69.1636%, Test Loss: 0.6011586189, Test Accuracy: 67.0400%
Epoch 20 - Train Loss: 0.5343008673, Accuracy: 72.8471%, Test Loss: 0.5895827413, Test Accuracy: 68.3200%
Epoch 30 - Train Loss: 0.4985563545, Accuracy: 75.5829%, Test Loss: 0.5819643140, Test Accuracy: 69.2000%
Epoch 40 - Train Loss: 0.4679729903, Accuracy: 77.4364%, Test Loss: 0.5820034146, Test Accuracy: 70.6800%
Epoch 50 - Train Loss: 0.4426602223, Accuracy: 79.0893%, Test Loss: 0.6062780023, Test Accuracy: 68.9400%
Epoch 60 - Train Loss: 0.4212049089, Accuracy: 80.3571%, Test Loss: 0.5932221413, Test Accuracy: 70.1400%
Epoch 70 - Train Loss: 0.4054715787, Accuracy: 81.3486%, Test Loss: 0.6064406037, Test Accuracy: 69.7400%
Epoch 80 - Train Loss: 0.3920199028, Accuracy: 82.0757%, Test Loss: 0.7479397655, Test Accuracy: 66.0000%
Epoch 90 - Train Loss: 0.3793664831, Accuracy: 82.7786%, Test Loss: 0.6246393919, Test Accuracy: 69.9000%
Epoch 100 - Train Loss: 0.3702111095, Accuracy: 83.2686%, Test Loss: 0.6326936483, Test Accuracy: 70.3800%
Epoch 110 - Train Loss: 0.3617749499, Accuracy: 83.7329%, Test Loss: 0.6820671558, Test Accuracy: 69.0800%
Epoch 120 - Train Loss: 0.3536044269, Accuracy: 84.1043%, Test Loss: 0.6389334202, Test Accuracy: 71.4400%
Epoch 130 - Train Loss: 0.3481948042, Accuracy: 84.4736%, Test Loss: 0.6241853237, Test Accuracy: 70.8800%
Epoch 140 - Train Loss: 0.3459528111, Accuracy: 84.7200%, Test Loss: 0.6407382488, Test Accuracy: 69.3800%
Epoch 150 - Train Loss: 0.3401777327, Accuracy: 84.9364%, Test Loss: 0.6799750924, Test Accuracy: 69.7400%
Epoch 160 - Train Loss: 0.3357463084, Accuracy: 85.1514%, Test Loss: 0.6708645225, Test Accuracy: 69.6600%
Epoch 170 - Train Loss: 0.3310116937, Accuracy: 85.4521%, Test Loss: 0.6961711049, Test Accuracy: 68.8400%
Epoch 180 - Train Loss: 0.3289567259, Accuracy: 85.5000%, Test Loss: 0.6533536911, Test Accuracy: 69.9200%
Epoch 190 - Train Loss: 0.3245867022, Accuracy: 85.7721%, Test Loss: 0.6729609966, Test Accuracy: 69.7000%
Epoch 200 - Train Loss: 0.3238609455, Accuracy: 85.7929%, Test Loss: 0.6859515905, Test Accuracy: 69.5800%
Epoch 210 - Train Loss: 0.3214935853, Accuracy: 85.9421%, Test Loss: 0.7205916643, Test Accuracy: 69.1800%
Epoch 220 - Train Loss: 0.3183520512, Accuracy: 86.1221%, Test Loss: 0.7689416409, Test Accuracy: 67.1200%
Epoch 230 - Train Loss: 0.3164121619, Accuracy: 86.3029%, Test Loss: 0.6590382457, Test Accuracy: 70.4400%
Epoch 240 - Train Loss: 0.3145191860, Accuracy: 86.3771%, Test Loss: 0.6572505832, Test Accuracy: 70.0000%
Epoch 250 - Train Loss: 0.3097408224, Accuracy: 86.4629%, Test Loss: 0.8095068336, Test Accuracy: 66.7800%
Epoch 260 - Train Loss: 0.3116088966, Accuracy: 86.4136%, Test Loss: 0.6613757014, Test Accuracy: 70.0400%
Epoch 270 - Train Loss: 0.3098915201, Accuracy: 86.5657%, Test Loss: 0.6531138420, Test Accuracy: 70.1800%
Epoch 280 - Train Loss: 0.3085777675, Accuracy: 86.6464%, Test Loss: 0.6587674618, Test Accuracy: 70.0600%
Epoch 290 - Train Loss: 0.3075122543, Accuracy: 86.6564%, Test Loss: 0.6553193331, Test Accuracy: 69.3200%
Epoch 300 - Train Loss: 0.3069336332, Accuracy: 86.6607%, Test Loss: 0.6995834708, Test Accuracy: 69.3200%
Finished Training. Total training time: 654.78 minutes

it took more than 10 hours to complete training!
the testing accuracy is around 70%
the best is Test Accuracy: 71.4400% at epoch 120.

here are the graphs:
accuracy curve:

loss curve:

so I think I would use the epoch 120 model as the best model.

I did try pushing to the limit by making 20x data augmented images, and those augmented images need to save to disk. however, it was too slow, about 1 hr per epoch.
so I aborted that.

I can change the learning rate to smaller, change the relu to leaky relu to see if any improvement. I feel that it would make it very slightly better.
however, I have kinda achieved a good performance with >70% accuracy.

it is a good logical break for fully connected Neural Network and I will move on to CNN and then ResNet50,
and then many retrained models like VGG16, Inception, GoogleNet and so on
to see if anything that can beat my personal best so far (>70% accuracy).

Will update it once I have results.
My goal is to gain experience in training neural network and try to beat my personal best (preferably achieve 80-90% accuracy)
my impression is when using pertained model and fine tuning that would probably achieve the best result. However, I need to test it out myself.

1 Like

I’m interested to hear how CNN will improve training time and accuracy on the same data set. I’d imagine a significant improvement. Please keep us posted. :slight_smile:

Hi ngkhatu,

You are very correct!
I did try using Course 4 Week 1 programming exercise example.
They are using 1 layer and 2 layers CNN to perform the kaggle cats vs dogs classification.
the result made me feel surprised in a big time.

just only 1 layer of CNN, without doing any data augmentation
the testing accuracy is already 73%

then I did the 2 layers CNN, no data augmentation.
Here is the result:

Epoch 1 - Train Loss: 0.6870851001, Accuracy: 57.2050%, Test Loss: 0.6788634062, Test Accuracy: 60.4400%
Epoch 10 - Train Loss: 0.5803946552, Accuracy: 70.1100%, Test Loss: 0.5760433674, Test Accuracy: 70.2400%
Epoch 20 - Train Loss: 0.5532099714, Accuracy: 72.3700%, Test Loss: 0.5543925166, Test Accuracy: 72.4400%
Epoch 30 - Train Loss: 0.5298072382, Accuracy: 74.0800%, Test Loss: 0.5402982235, Test Accuracy: 73.4200%
Epoch 40 - Train Loss: 0.5118299467, Accuracy: 75.6400%, Test Loss: 0.5222501159, Test Accuracy: 74.9000%
Epoch 50 - Train Loss: 0.4984353853, Accuracy: 76.4650%, Test Loss: 0.5135366321, Test Accuracy: 75.4200%
Epoch 60 - Train Loss: 0.4866202139, Accuracy: 77.1600%, Test Loss: 0.5057876110, Test Accuracy: 76.0000%
Epoch 70 - Train Loss: 0.4768056226, Accuracy: 77.9850%, Test Loss: 0.5034736991, Test Accuracy: 75.8800%
Epoch 80 - Train Loss: 0.4701201431, Accuracy: 78.2950%, Test Loss: 0.4964105487, Test Accuracy: 76.4600%
Epoch 90 - Train Loss: 0.4620706673, Accuracy: 78.9950%, Test Loss: 0.4943421781, Test Accuracy: 76.5800%
Epoch 100 - Train Loss: 0.4579596681, Accuracy: 79.0750%, Test Loss: 0.4912835956, Test Accuracy: 76.8400%
Epoch 110 - Train Loss: 0.4510383642, Accuracy: 79.4400%, Test Loss: 0.4871637225, Test Accuracy: 77.1800%
Epoch 120 - Train Loss: 0.4461856880, Accuracy: 79.6750%, Test Loss: 0.4874891043, Test Accuracy: 76.9600%
Epoch 130 - Train Loss: 0.4402121443, Accuracy: 80.0400%, Test Loss: 0.4875140488, Test Accuracy: 77.0000%
Epoch 140 - Train Loss: 0.4374789443, Accuracy: 80.2050%, Test Loss: 0.4832347631, Test Accuracy: 77.3000%
Epoch 150 - Train Loss: 0.4299268812, Accuracy: 80.7900%, Test Loss: 0.4830815494, Test Accuracy: 77.1400%
Epoch 160 - Train Loss: 0.4276881824, Accuracy: 80.7550%, Test Loss: 0.4806578159, Test Accuracy: 77.3400%
Epoch 170 - Train Loss: 0.4238304696, Accuracy: 80.9800%, Test Loss: 0.4814046919, Test Accuracy: 77.5200%
Epoch 180 - Train Loss: 0.4223759567, Accuracy: 81.0200%, Test Loss: 0.4809083045, Test Accuracy: 77.2400%
Epoch 190 - Train Loss: 0.4175595955, Accuracy: 81.2700%, Test Loss: 0.4795779884, Test Accuracy: 77.4800%
Epoch 200 - Train Loss: 0.4147988608, Accuracy: 81.3100%, Test Loss: 0.4879596531, Test Accuracy: 77.3800%
Epoch 210 - Train Loss: 0.4104843929, Accuracy: 81.4800%, Test Loss: 0.4786606431, Test Accuracy: 77.3200%
Epoch 220 - Train Loss: 0.4085408135, Accuracy: 81.4750%, Test Loss: 0.4816853106, Test Accuracy: 76.8000%
Epoch 230 - Train Loss: 0.4053019222, Accuracy: 81.7450%, Test Loss: 0.4952992797, Test Accuracy: 76.1200%
Epoch 240 - Train Loss: 0.4015360050, Accuracy: 81.8850%, Test Loss: 0.4796106517, Test Accuracy: 77.2600%
Epoch 250 - Train Loss: 0.3988478480, Accuracy: 82.2600%, Test Loss: 0.4942362309, Test Accuracy: 77.1600%
Epoch 260 - Train Loss: 0.3976849169, Accuracy: 82.1600%, Test Loss: 0.4852837324, Test Accuracy: 77.5600%
Epoch 270 - Train Loss: 0.3970640944, Accuracy: 82.1550%, Test Loss: 0.4804431498, Test Accuracy: 77.2400%
Epoch 280 - Train Loss: 0.3937637265, Accuracy: 82.4700%, Test Loss: 0.4815192521, Test Accuracy: 77.3800%
Epoch 290 - Train Loss: 0.3909590972, Accuracy: 82.6650%, Test Loss: 0.4843246639, Test Accuracy: 77.1800%
Epoch 300 - Train Loss: 0.3863708302, Accuracy: 82.6700%, Test Loss: 0.4935418069, Test Accuracy: 76.6200%
Finished Training. Total training time: 365.34 minutes
Training Accuracy: 82.08%
Testing Accuracy: 76.62%

the testing accuracy is about 77% easily.

here are the graphs:
accuracy curve:

loss curve:

the result also shows that not overfitting comparing with fully connected neural networks I did before, the result is more stable for CNN.

I know CNN should perform better, however, I didn’t expect that much improvement, really eye-opening. I had done a lot in the fully connected neural network (including hyperparameter tuning, dropout, L2 regularisation, and data augmentation), accuracy is only about 70%
for CNN, I didn’t do any of them yet, just using the hyperparameter used in the programming exercise and for only 2 convolutional layers, the accuracy is easily >77%
this experience is really important to me, and I have that hand-on experience on it.

I feel that I can make it >80% accuracy by doing some hyperparameter tuning or so.

but here is the finding:
human error: 2% (1-0.98 (for best kaggle performance)
train error: 18%. (1-82.08%)
dev error: 23% (1- 77%)
from Course 3, it is now underfitting instead (difference between human error and train error is larger than that between train error and dev error)

Based on the lecture note in Course 3, bias/underfitting:

  • train a bigger network
  • run gradient descent longer/better optimization algorithm e.g. momentum, RMSprop, Adam
  • NN architecture (RNN/CNN) /hyperparameter search (activation, # of hidden units)

so I will go ahead to train using the following instead:
ResNet50, VGG16, Inception, GoogleNet and so on.
It is part of training a bigger network.

I will update more result later.
I am sure the experience will be gold.

1 Like