Do you have an explanation to the large spike you have in the cost function plot you attached? Such a spike is a bit unusual in a NN convergence. Also, can you share a bit more about your model? It’s hard to understand much from only looking at the cost function values
This seemed un-intuitive to me, too. Haven’t broken it down yet.
To better appreciate this: is our implementation of Gradient Descent a Greedy one on the assumption of a convex optimisation problem?
Then: we should technically not be able to see such a picture, should we?
Solved this now:
Turns out trivial, at least operationally: I had used intialize_parameters_deep, rather than initialize_parameters here.
I had done this on the assumption that the two should return the same results, both being seeded.
However, looking at the implementation of initialize_parameters_deep, the constant multiplier defers. Why is that? And why do we use the square root-multiplier here?
Sorry if I missed this in the course slides somewhere, but couldn’t find a direct explanation.