Week 1, W initialization to large random number, and HE

gon.g · August 27, 2021, 10:07am

I tried to initialize the weights as in Course 1, by multiplying the random initialization by 0.01 instead of 10.

This gives me worst results than multiplying by 10, why can it be?

Clarification: HE initialization is similar to Xavier Initialization, this means is a way of regularizing the weights, and help them not explode or vanish. Is that right?

Thank you very much in advance

nramon · August 27, 2021, 3:00pm

Hi, @gon.g.

There isn’t a single value that works best for all problems.

Both He and Xavier initialization help with vanishing and exploding gradients. He initialization works better with ReLU activations, and Xavier with sigmoid/tanh.

These notes are a good complement to the course lectures.

Happy learning

gon.g · August 31, 2021, 9:27am

Hello @nramon,

Thank you very much for your early and constructive response!

Topic		Replies	Views
Week2 Programming Assignment 1 - random weight initialization Improving Deep Neural Networks: Hyperparameter tun	3	514	October 20, 2022
Weight Initialisation - random can be better than He? Improving Deep Neural Networks: Hyperparameter tun	1	530	September 30, 2021
Week 1 - Programming Assignment 1 Improving Deep Neural Networks: Hyperparameter tun	1	601	April 10, 2022
Parameter initialization question Improving Deep Neural Networks: Hyperparameter tun	1	593	February 18, 2023
Why do we multiply the random intial weights by 0.01? Neural Networks and Deep Learning	2	677	September 2, 2022

Week 1, W initialization to large random number, and HE

Related topics