Week 1, W initialization to large random number, and HE

I tried to initialize the weights as in Course 1, by multiplying the random initialization by 0.01 instead of 10.

This gives me worst results than multiplying by 10, why can it be?

Clarification: HE initialization is similar to Xavier Initialization, this means is a way of regularizing the weights, and help them not explode or vanish. Is that right?

Thank you very much in advance

Hi, @gon.g.

There isn’t a single value that works best for all problems.

Both He and Xavier initialization help with vanishing and exploding gradients. He initialization works better with ReLU activations, and Xavier with sigmoid/tanh.

These notes are a good complement to the course lectures.

Happy learning :slight_smile:


Hello @nramon,

Thank you very much for your early and constructive response!

1 Like