W1 Assignment2_Regularization

Kamal_Nayan · July 16, 2023, 12:44pm

Does in Dropout regularization , the value of parameters associated with that particular neuron which was dropped out change in single iteration ??

Kamal_Nayan · July 17, 2023, 3:41am

But Sir , the parameters learn based on dw and db right .
As for every iteration , A single neuron might be dropped out for a particular example and might not be dropped for a different example .
So, our model learns for every parameter of every neuron for a single iteration .
Can we say our model doesn’t learn parameters for dropped out neurons for a single example every iteration.

rmwkwok · July 17, 2023, 3:46am

One epoch is divided into many mini-batches. One mini-batch corresponds to one gradient descent. In one mini-batch (or one gradient descent), some neurons are chosen to be dropped out for each sample of that mini-batch.

Kamal_Nayan · July 17, 2023, 3:52am

Sir , I have this understanding , Is this correct??

Kamal_Nayan · July 17, 2023, 3:54am

This psuedo code also suggests the same

rmwkwok · July 17, 2023, 4:02am

Hi @Kamal_Nayan,

I have updated my last reply.

Thanks for pointing it out

Raymond

Kamal_Nayan · July 17, 2023, 4:02am

Sir , In this I guess the neurons that are chosen to be dropped off are random and different for each training example (every sample for a mini batch)

rmwkwok · July 17, 2023, 4:02am

Yes, that’s correct.

Kamal_Nayan · July 17, 2023, 4:03am

So , This might be a wrong understanding ??

rmwkwok · July 17, 2023, 4:03am

Yes, that was wrong, and I have just removed it so not to mislead the others.

rmwkwok · July 17, 2023, 4:06am

Btw, @Kamal_Nayan, it’s great that you have figured it out yourself!

Cheers,
Raymond

Kamal_Nayan · July 17, 2023, 4:08am

Thank you sir , A last question
Is this intuition for dropout regularization correct

rmwkwok · July 17, 2023, 4:19am

why would you still say the first sentence “Dropout parameters don’t learn in a single iteration”, after knowing that in one gradient descent, it is possible that a neuron can learn from at least one sample? I thought you just pointed out I was incorrect in that.

rmwkwok · July 17, 2023, 4:29am

What do you think after seeing this?

What is your meaning of single iteration? Normally I would say it means one gradient descent.

Kamal_Nayan · July 17, 2023, 6:00am

I actually wanted to say that the parameters of a particular neuron dropped out for an particular example don`t learn .
Let me say this with an example :
lets say there are 2 training examples x1 and x2 with some input features .
for x1 lets assume 5th neuron of 3rd layer of some DNN gets dropped off but for x2 the former neuron does not gets dropped off . So my point is in one gradient descent that particular neuron doesn’t learn wrt x1 but learns wrt x2 . So as a whole , our neuron learns for parameters in one gradient descent wrt to the whole sample .
This was what I wanted to state.

Also I think that overfitting happens due to exceptions in our data that are not general , but our DNN’s learn them so good that generalization becomes less and so the problem of high variance comes . So if we do this dropout regularization then the problem of high variance is solved .

Kamal_Nayan · July 17, 2023, 6:09am

Am i completely correct now ??

rmwkwok · July 17, 2023, 7:28am

This line should have been “consider the 3rd layer of some DNN, for x1 lets assume its 5th feature gets dropped off”

Look at this graph again:

You know matrix multiplication, right? Now you see three feature values are dropped off from the two samples. Would any one of the neurons completely not learn anything? The answer is no, and so we can’t say “5th neuron … gets dropped off”. For the zeroth sample (the zeroth column of matrix X), two features are dropped, and so the effect is, for this sample, all neurons learn partially from that sample.

Would you like to rewrite your understanding?

Raymond

Kamal_Nayan · July 17, 2023, 7:38am

Ok , I understand what you are saying , you are dropping off the sample features . I think that can also work to reduce overfitting .
But in NG sir`s lecture he taught us to drop off the neurons not the input features .

Here is the image see we actually drop off the neuron.

Kamal_Nayan · July 17, 2023, 7:42am

See this , It is from the assignment we got .

rmwkwok · July 17, 2023, 7:54am

But what is a^{[3]}? Isn’t it the sample matrix? if d3 is multiplied to a3, isn’t it dropping feature values?

Let’s stick with the lecture and not the assignment for now.

Based on the lecture and my above comment, would you like to rewrite your understanding, and we look at the latest writeup?

Topic		Replies	Views
Question about dropout regularization Improving Deep Neural Networks: Hyperparameter tun	4	731	November 24, 2022
A doubt on dropout Improving Deep Neural Networks: Hyperparameter tun	4	517	August 17, 2023
Dropout Technique Improving Deep Neural Networks: Hyperparameter tun	1	554	June 15, 2021
Dropout Frequency Improving Deep Neural Networks: Hyperparameter tun	1	580	June 28, 2021
Regularization Improving Deep Neural Networks: Hyperparameter tun	3	592	July 15, 2023

W1 Assignment2_Regularization

Related topics