Does in Dropout regularization , the value of parameters associated with that particular neuron which was dropped out change in single iteration ??

But Sir , the parameters learn based on dw and db right .

As for every iteration , A single neuron might be dropped out for a particular example and might not be dropped for a different example .

So, our model learns for every parameter of every neuron for a single iteration .

Can we say our model doesnâ€™t learn parameters for dropped out neurons for a single example every iteration.

One epoch is divided into many mini-batches. One mini-batch corresponds to one gradient descent. In one mini-batch (or one gradient descent), some neurons are chosen to be dropped out for each sample of that mini-batch.

Sir , I have this understanding , Is this correct??

Sir , In this I guess the neurons that are chosen to be dropped off are random and different for each training example (every sample for a mini batch)

Yes, thatâ€™s correct.

So , This might be a wrong understanding ??

Yes, that was wrong, and I have just removed it so not to mislead the others.

why would you still say the first sentence â€śDropout parameters donâ€™t learn in a single iterationâ€ť, after knowing that in one gradient descent, it is possible that a neuron can learn from at least one sample? I thought you just pointed out I was incorrect in that.

What do you think after seeing this?

What is your meaning of single iteration? Normally I would say it means one gradient descent.

I actually wanted to say that the parameters of a particular neuron dropped out for an particular example don`t learn .

Let me say this with an example :

lets say there are 2 training examples x1 and x2 with some input features .

for x1 lets assume 5th neuron of 3rd layer of some DNN gets dropped off but for x2 the former neuron does not gets dropped off . So my point is in one gradient descent that particular neuron doesnâ€™t learn wrt x1 but learns wrt x2 . So as a whole , our neuron learns for parameters in one gradient descent wrt to the whole sample .

This was what I wanted to state.

Also I think that overfitting happens due to exceptions in our data that are not general , but our DNNâ€™s learn them so good that generalization becomes less and so the problem of high variance comes . So if we do this dropout regularization then the problem of high variance is solved .

Am i completely correct now ??

This line should have been â€śconsider the 3rd layer of some DNN, for x1 lets assume its 5th feature gets dropped offâ€ť

Look at this graph again:

You know matrix multiplication, right? Now you see three feature values are dropped off from the two samples. Would any one of the neurons completely not learn anything? The answer is no, and so we canâ€™t say â€ś5th neuron â€¦ gets dropped offâ€ť. For the zeroth sample (the zeroth column of matrix X), two features are dropped, and so the effect is, for this sample, all neurons learn partially from that sample.

Would you like to rewrite your understanding?

Raymond

Ok , I understand what you are saying , you are dropping off the sample features . I think that can also work to reduce overfitting .

But in NG sir`s lecture he taught us to drop off the neurons not the input features .

Here is the image see we actually drop off the neuron.

But what is a^{[3]}? Isnâ€™t it the sample matrix? if `d3`

is multiplied to `a3`

, isnâ€™t it dropping feature values?

Letâ€™s stick with the lecture and not the assignment for now.

Based on the lecture and my above comment, would you like to rewrite your understanding, and we look at the latest writeup?