I am creating this post for my learning and putting my training a deep neural network into practice.
the task I use is the cat classification from Course 1.
I’ve just made a n-layer neural network to test it out.
here is the result:
Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 72 % (percentage of correctly labelled datapoints)
Hyperparameters: {'learning_rate': 0.0007, 'hidden_sizes': [128, 64], 'dropout_rate': 0.25}
Training Accuracy: 100.00%
Testing Accuracy: 58.00%
the above is the quick and dirty results (using my intuition)
so my neural network even performed worse than logistic regression
thus the goal is probably needed to make the model at least perform better than logistic regression.
from the above, I know it is due to overfitting
as training accuracy is 100% but testing accuracy is 58%.
My tools are dropout and L2 regularisation.
I didn’t use any regularisation at all for 58% testing accuracy
I don’t know if I am correct, I feel that L2 regularisation often kinda over-regularisation somehow
so I just tune dropout rate to 0.2 and see how it goes.
The process I am doing is just manual tuning the knobs as told by Andrew during videos, and try to turn one at a time.
l2 = 0.0, dropout= 0.2
Epoch [1/3000], Loss: 0.87964642
Epoch [1001/3000], Loss: 0.00514520
Epoch [2001/3000], Loss: 0.00174722
Finished Training
Training Accuracy: 100.00%
Testing Accuracy: 72.00%
same as logistic regression
changed dropout rate = 0.15
Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 72 % (percentage of correctly labelled datapoints)
Epoch [1/3000], Loss: 0.65881771
Epoch [1001/3000], Loss: 0.00089703
Epoch [2001/3000], Loss: 0.00281329
Training Accuracy: 100.00%
Testing Accuracy: 76.00%
so now the testing accuracy is 76%, better than logistic regression.
but still worse than human level (I think at least 90%)
so I don’t change the dropout this time.
but doing the data augmentation.
to make the 209 training data to 418 training with random rotating the images.
Accuracy of Train logistic regression: 100 % (percentage of correctly labelled datapoints)
Accuracy of Test logistic regression: 68 % (percentage of correctly labelled datapoints)
train_X's shape: (418, 12288)
train_Y's shape: (418, 1)
Epoch [1/3000], Loss: 0.59803784
Epoch [1001/3000], Loss: 0.00353110
Epoch [2001/3000], Loss: 0.00033148
Training Accuracy: 100.00%
Testing Accuracy: 72.00%
with data augmentation, it didn’t increase test accuracy
but even logistic regression dropped from 72% to 68%
and my best 76% dropped to 72%
now I think I might need a bigger network.
the current layers_dims = [12288, 128, 64, 1]
probably add a few more layers hopefully it can learn in more complex way.
if anyone has more ideas, please let me know.
my goal is to make it having at least 80-90% testing accuracy, which is approaching human level (I guess).
the tools I have are (as mentioned in week1 of course 3) :
to reduce variance (big diff between training error and test error)
- regularization (L2/dropout/data augmentation)
- get more training data
- NN architureure/hyperparmeters search
I am not using Conv Layers yet, as I haven’t learnt Course 4 yet.