In initialization lab course 2 week1, when we tested zero parameters initialization the cost was constant indicating no learning at all, when I looked at test file we are using 3 layers neural network (relu, relu, sigmoid), when I tested the same arch on course 1 week4 lab (cat recog) there was a small learning resulting from db3 grad resulting in decreasing cost with small amount, using pen and paper the result from my trial made more sense as db3 won’t be equal zero, so I was wondering is it because of dataset difference, why the cost was constant in intialization lab?
Thanks in advance!
You filed this under AI Discussions, but it sounds like you are asking about DLS Course 2 Week 1.
It’s a great idea to run experiments to try to generalize the points made in the courses. If you don’t break symmetry, it’s not that you can’t learn anything: it’s just that the learning is not very useful. If you initialize all the weights and bias values to zeros (or any other fixed value), then what happens is that all the neurons in any given layer all learn the same thing: if they start all the same, then they stay the same during back propagation. It has the effect that you have a neural network with only one neuron in each layer. That means that it can learn something, but it just will not be a very powerful or useful solution.
So maybe they just got lucky in the Initialization assignment and chose the situation carefully so that the cost would not change.
I tried to post on course Q&A but page crashed may be it was a network prob.
I have read your explanation about this part and it was really insightful and helpful which made me what to try different senarios, I got the idea of not a helpful learnining or powerful network.
I thought that if we both initialize parameters to zeroes that will be the same output that’s why I compared them.