Dropout technique makes me confused

2004lion2004 · April 3, 2022, 9:13am

Why would you use dropout technique if you can rebuild a smaller NN?

balaji.ambresh · April 3, 2022, 10:06am

Dropout does not mean that you are building a smaller network.

During each training iteration, few nodes are randomly switched off and you train a smaller part of the network.
As a result, each node in the neural network learns to pay attention to all inputs from previous layer and not just a few nodes.

Here’s an example from an input layer perspective:
Say you have 10 features in your input and height is an important feature. Without dropout, your NN could learn to pay a lot of attention to this 1 feature. With dropout applied to the input layer, sometimes, the height feature could be made unavailable (i.e. dropped out) and the NN has to predict the target by using just the other features. As a result, NN will learn to spread its weights across all features over time not just the 1 feature.

2004lion2004 · April 3, 2022, 10:40am

Oh, so after training your NN has the same size as it was before the dropout?

balaji.ambresh · April 3, 2022, 10:52am

Architecture of NN remains unchanged.

2004lion2004 · April 3, 2022, 10:56am

Oh, okay. Thank you. Now it is clear to me!

paulinpaloalto · April 3, 2022, 4:08pm

Here’s another thread that discusses the question of why you don’t just reduce the size of the network to eliminate overfitting instead of applying dropout.

X0450 · May 12, 2022, 4:46am

Thank you for your solution. I understand the similarity between the idea of dropout and l2 regularization. but could you please explain the difference between them? The lecture talks about it in the following way

’ L2 penalty on different weights are different, depending on the size of the activations being multiplied that way.

But to summarize, it is possible to show that drop out has a similar effect to L2 regularization. Only to L2 regularization applied to different ways can be a little bit different and even more adaptive to the scale of different inputs.’
Thank you in advance.

balaji.ambresh · May 12, 2022, 5:51am

Here’s how L2 and dropout are different.

In case of L2, gradient of each weight has an additional factor of \frac{\lambda}{m}{weight}. This is a fixed adaptation technique.

When it comes to dropout, that’s not the case. We update weights only for nodes that took part during forward pass for that iteration. This has the effect of learning how to adapt weights by paying more attention to inputs.

It would be best for you to try the 2nd programming assignment for this week since L2 and dropout will have to be implemented from scratch.

Topic		Replies	Views
Week 1, Understanding Dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	543	July 18, 2022
Dropout as a more Adaptive Form of L2 Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	665	May 15, 2022
Course 2. Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	519	April 23, 2022
Week 1: dropout vs reducing network? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	14	1401	August 19, 2023
A doubt on dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	529	August 17, 2023

Dropout technique makes me confused

Related topics