Is the formula for non-random weight initialization the same with dropout?

jeffreywang · July 14, 2021, 6:40pm

Hi Everyone,

When learning about non-random weight initialization to prevent vanishing/exploding gradients, we see that W[l] = np.random.randn([shape of layer L]) * np.sqrt(1/n[l-1]).

Does this initialization work with dropout regularization, seeing as n[l-1], the number of nodes in layer L-1 connected to each neuron in layer L, changes each time?

nramon · July 16, 2021, 2:47pm

Hi, @jeffreywang.

Excellent question.

It does seem like the noise introduced by Dropout could affect variance propagation, and there are initialization strategies that take this into account, but I wouldn’t be able to recommend a specific one.

Hopefully someone else can shed more light on this

Topic		Replies	Views
Dropout initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	560	June 8, 2021
Week1 - Programming Assignment: Regularization - dropout code Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	694	April 1, 2022
Week1 ex2 np.random.seed(1) in forward_propagation_with_dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	770	August 10, 2021
Regularization Week 1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	604	June 16, 2021
Initializing Weights to Mitigate Vanishing/Exploding Gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	13	595	October 31, 2021

Is the formula for non-random weight initialization the same with dropout?

Related topics