I am stuck on Ex. 4, the two layer model: my costs won’t reconcile with the expected costs from the test cases. I cannot figure out why.
It seems like I’m solving this correctly, as the model can be trained and achieves slightly higher accuracy than what is expected:
My suspicion is that I’m missing something about forward propagation step, but I seem to have a blind spot here.
The remainder of this seems to have outputs in line with the test expectations, but it’s currently graded 50/100.
Hello @mahaoyu and welcome to Discourse,
Do you have an explanation to the large spike you have in the cost function plot you attached? Such a spike is a bit unusual in a NN convergence. Also, can you share a bit more about your model? It’s hard to understand much from only looking at the cost function values
Hi @yanivh ! Thanks for your response.
Re large spike: thanks for picking up on this.
- This seemed un-intuitive to me, too. Haven’t broken it down yet.
- To better appreciate this: is our implementation of Gradient Descent a Greedy one on the assumption of a convex optimisation problem?
- Then: we should technically not be able to see such a picture, should we?
Solved this now:
- Turns out trivial, at least operationally: I had used intialize_parameters_deep, rather than initialize_parameters here.
- I had done this on the assumption that the two should return the same results, both being seeded.
- However, looking at the implementation of initialize_parameters_deep, the constant multiplier defers. Why is that? And why do we use the square root-multiplier here?
Sorry if I missed this in the course slides somewhere, but couldn’t find a direct explanation.