I tried to initialize the weights as in Course 1, by multiplying the random initialization by 0.01 instead of 10.
This gives me worst results than multiplying by 10, why can it be?
Clarification: HE initialization is similar to Xavier Initialization, this means is a way of regularizing the weights, and help them not explode or vanish. Is that right?
There isn’t a single value that works best for all problems.
Both He and Xavier initialization help with vanishing and exploding gradients. He initialization works better with ReLU activations, and Xavier with sigmoid/tanh.
These notes are a good complement to the course lectures.