Do you have an explanation to the large spike you have in the cost function plot you attached? Such a spike is a bit unusual in a NN convergence. Also, can you share a bit more about your model? It’s hard to understand much from only looking at the cost function values

This seemed un-intuitive to me, too. Haven’t broken it down yet.

To better appreciate this: is our implementation of Gradient Descent a Greedy one on the assumption of a convex optimisation problem?

Then: we should technically not be able to see such a picture, should we?

Solved this now:

Turns out trivial, at least operationally: I had used intialize_parameters_deep, rather than initialize_parameters here.

I had done this on the assumption that the two should return the same results, both being seeded.

However, looking at the implementation of initialize_parameters_deep, the constant multiplier defers. Why is that? And why do we use the square root-multiplier here?

Sorry if I missed this in the course slides somewhere, but couldn’t find a direct explanation.