So after basically rewriting the Week 2 Programming Assignment with type annotations and objects (Python actually becomes readable at that point, moving beyond MVP, although even then the IDE’s static checker is not sure about the code semantics ), leaving barely anything of the original, I have added a bit of code to generate plots of the cost generated by “perturbed parameters”.
Suppose you take the neural network at training step t, and you perturb one of its weights a bit (adding a small value Δw or Δb). You can then generate the new cost for this “perturbed network” and compute the Δcost relative to the “unperturbed network” cost. This allows you to draw a curve Δcost(Δw) or Δcost(Δb).
This can be repeated for a number of weights (drawing curves for all 64’000 parameters is a bit much, I just sampled a few at random, and also considered the weight).
One can then see get an impression the “shape” of the cost function along a small sample of the dimensions defined by the 64’000+ parameters and how gradient descent moves the “current point” in that space (always at (0,0) in the graphs as we plot the (Δw,Δcost) relative to that point) to the minimum (which happens to be global in this case).
The orange curve is the Δcost curve defined by the “perturbed bias”, the blue ones a sample of the Δcost curves defined by a smattering of “perturbed weights”, not always the same ones.
At first all parameters are bit too large, adding negative ε’s will reduce cost:
All parameters are bit too small, adding positive ε’s will reduce cost:
All parameters are about right:
The gradient here is certainly zero:
Nothing changes any more:
And if one plots the cost:
Finally, I have transformed the images into histograms of a 20x20x20 color cube and trained the logistic regression network on that. So we are just looking at the distribution of color values and want to predict whether there may be cat behind it. This gives us good results relative to the (ordered) set of R,G,B values as specified in the assignment for much less computation. We find:
Cost after iteration 1900: 0.0000
train accuracy: 100.00 %
test accuracy: 74.00 %
train false positives: 0.00 %
train false negatives: 0.00 %
test false positives: 12.00 %
test false negatives: 14.00 %
Found 13 failed tests
But one must not normalize the histogram to sum to 1.0 otherwise the results are really bad but I feel that normalization of some kind is required. But the NN model is too simple anyway.
The perturbation curves are more exciting than previously, although a lot of them, including the bias, are just flat. Not sure whether buggy.
I can’t say much more about this as this is a graded assignment. Except that you really want to use Python’s type annotation and object system as soon as possible to spare yourself unnecessary pain (and throw asserts at the code, too)