Relation between Accuracy and Cost in Week 4 Assigment

Very interesting! It’s great that you are doing this type of investigation. There’s always something interesting to learn. I agree that it doesn’t seem logical that the test cost would increase in the way that you show. Let’s dig in and see what more we can learn here!

First there are a couple of general things to say:

  1. Yes, you’re right that all this is overfitting. And maybe the bigger problem is that this whole situation is pretty unrealistic in that the dataset is way way too small to give a generalizable solution to a problem this complex. Here’s a thread which discusses that point in a bit more detail and shows that the dataset is very carefully curated to give the results as good as we see here.

  2. The relationship between cost and accuracy is not as straightforward as you might think at first glance. The high level point is that accuracy is quantified, but the cost isn’t. What I mean by that is illustrated by the example of a sample with a label of 1. If the \hat{y} value after 1000 iterations is 0.52, then the answer is already correct. But if after 2000 iterations, the \hat{y} value is 0.75, then the cost will be lower, but the accuracy is still the same. Of course it could also go the other direction: going from 0.75 to 0.52 in a later iteration will give you a higher cost with the same accuracy, which is what seems to be happening with the test data in your case.

  3. It’s really only accuracy that we actually care about. The actual J value doesn’t really tell you that much as we see from item 2), but there still is something puzzling in the behavior here that is worth investigating.

As far as I can see so far, your code looks completely correct. You could have simplified it a bit by using np.mean to compute the accuracy values. It would also be more efficient to rewrite the code to pass in the iteration numbers where you want the checkpoints and then you’d only have to run the training once, but I totally get why you did it the way you did: my way would be a big rewrite to the core functions, which just messes everything up and introduces more complexity.

Ok, none of the above really answers anything yet, but this is just the next step after your interesting steps above. More investigation required. Next I want to dig in a bit and actually look in more detail at the test cost numbers.