Confusion about dimension of Dropout matrix

paulinpaloalto · January 4, 2022, 9:00pm

It is a good question and an interesting point. I forget exactly what Prof Ng says in the lectures, but it is quite clear in the instructions for the assignment that he expects the mask values to be matrices, not vectors. So the effect is that we treat each sample in a given minibatch differently w.r.t. dropout. We are still removing nodes, but different ones for each sample. So it’s effectively as if we were doing Stochastic Gradient Descent w.r.t. the way dropout is applied. My intuition is that this makes the effect weaker for a given keep_prob value. There have been some interesting past discussions of this point, e.g. this one. Please have a look at that discussion and see if it sheds any further light.

Topic		Replies	Views
W1 - Programming Assignment 2 - Improving Neural Network Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	632	February 21, 2022
Week 1 Exercise 6 - test doesn't pass Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	595	March 14, 2023
Week1 - Programming Assignment: Regularization - dropout code Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	702	April 1, 2022
Dimension of dropout matrix Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	496	August 14, 2022
Implementing dropout regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	636	May 14, 2022

Confusion about dimension of Dropout matrix

Related topics