In feedforward neural network, initializing the parameter W to be random and multiply it by small constant as explained in the course is understood but when I tried to implement the whole FFNN model myself and run it on cat calssifier from week 4 in Deeplearning and Neural network course, the cost decreases by order of 10^-2 and accuracy was so bad, whereas when I divided it by np.sqrt(number of nodes in previous layer) which was done in the implmentation of intialization in the imported functions it worked really well, anyone can help clarify this missunderstanding?

Thanks in advance

Depending on the statistics of the dataset and the depth of the model, you may have to adjust the weight initialization.

Itâs a bit of a trial-and-error process.

Isnât the idea of convex loss function that cost converges with any initialization for parameters (while maintaining the randomness in case of neural network) or I got something wrong?

From what I understood initialization can affect learning speed as it determines where you have started but in my case the learning was so bad.

Well, in DLS we learn the Xe / Havier inits.

My (simple) interpretation of the inits is you want, since we are in the end trying to do an optimization, them to be slightly excited enough (though not biased), where we can get a little âactionâ to happen.

In contrast, imagine we were running an optimization from a starting point of all zeros-- Theyâd have no idea where to go.

*Or, from zero, no loss to be solved.

The neural network cost function is not convex.

But that is not why random initialization of the hidden layer weights is required. It is for âsymmetry breakingâ. This is a specific requirement of NN hidden layers.

Thank you all so much for the clarification.