Some Experiments with the Cat Recognition Assignment (C1W4A2)

Ok, here are the traded positives for another try with numTrade = 4:


And here are the traded negatives:

Here are the new accuracy results:

Cost after iteration 2700: 0.049674395970931415
Cost after iteration 2800: 0.04682934688220304
Cost after iteration 2900: 0.043699220183744183
Cost after iteration 2999: 0.041188403451858445
Accuracy: 0.9999999999999998
Accuracy: 0.74
positive samples: train = 76, test = 29
bal_pred_test error count = 13
bal_pred_test false negatives = 4
bal_pred_test false positives = 9

Here are the mislabeled images:


So the results are indeed different with the same number traded and the new accuracy results are quite a bit worse. If you compare the mislabeled images with the “control” output with no changes to the dataset, you’ll see that all 7 of the false positives from the “control” run are still there, but we have two out of the newly added negatives traded in that are new false positives. So that accounts for the total of 9 false positives. But the false negatives make a lot less sense: one of the false negatives from “control” got traded, but the other one is still a false negative. But now we have three brand new false negatives. I’m sorry, but that just does not make sense to me: we gave the training algorithm more positives to learn from, but it does worse and flips 3 images that were correctly labeled as cats before (when it had fewer positive examples to train from) to being “non-cats”.

So maybe the only conclusion here is that with a dataset this small, everything is highly sensitive to the smallest change and you don’t get any smoothing benefits from statistical effects. In other words, the fact that we get as good numbers as we do in the “control” version does say that they chose carefully. If we perturb the balance, we only get worse results. But there’s really probably not that much more to be learned here, since this is basically an unrealistic case. In the “meta” sense as well: too small to make generalizable conclusions. Well, maybe the right way to state the result is that the one generalizable conclusion from all this is that small datasets are a bummer. :nerd_face:

8 Likes