Does in Dropout regularization , the value of parameters associated with that particular neuron which was dropped out change in single iteration ??
But Sir , the parameters learn based on dw and db right .
As for every iteration , A single neuron might be dropped out for a particular example and might not be dropped for a different example .
So, our model learns for every parameter of every neuron for a single iteration .
Can we say our model doesn’t learn parameters for dropped out neurons for a single example every iteration.
One epoch is divided into many mini-batches. One mini-batch corresponds to one gradient descent. In one mini-batch (or one gradient descent), some neurons are chosen to be dropped out for each sample of that mini-batch.
Sir , I have this understanding , Is this correct??
Sir , In this I guess the neurons that are chosen to be dropped off are random and different for each training example (every sample for a mini batch)
Yes, that’s correct.
So , This might be a wrong understanding ??
Yes, that was wrong, and I have just removed it so not to mislead the others.
Btw, @Kamal_Nayan, it’s great that you have figured it out yourself!
Cheers,
Raymond
why would you still say the first sentence “Dropout parameters don’t learn in a single iteration”, after knowing that in one gradient descent, it is possible that a neuron can learn from at least one sample? I thought you just pointed out I was incorrect in that.
What do you think after seeing this?
What is your meaning of single iteration? Normally I would say it means one gradient descent.
I actually wanted to say that the parameters of a particular neuron dropped out for an particular example don`t learn .
Let me say this with an example :
lets say there are 2 training examples x1 and x2 with some input features .
for x1 lets assume 5th neuron of 3rd layer of some DNN gets dropped off but for x2 the former neuron does not gets dropped off . So my point is in one gradient descent that particular neuron doesn’t learn wrt x1 but learns wrt x2 . So as a whole , our neuron learns for parameters in one gradient descent wrt to the whole sample .
This was what I wanted to state.
Also I think that overfitting happens due to exceptions in our data that are not general , but our DNN’s learn them so good that generalization becomes less and so the problem of high variance comes . So if we do this dropout regularization then the problem of high variance is solved .
Am i completely correct now ??
This line should have been “consider the 3rd layer of some DNN, for x1 lets assume its 5th feature gets dropped off”
Look at this graph again:
You know matrix multiplication, right? Now you see three feature values are dropped off from the two samples. Would any one of the neurons completely not learn anything? The answer is no, and so we can’t say “5th neuron … gets dropped off”. For the zeroth sample (the zeroth column of matrix X), two features are dropped, and so the effect is, for this sample, all neurons learn partially from that sample.
Would you like to rewrite your understanding?
Raymond
Ok , I understand what you are saying , you are dropping off the sample features . I think that can also work to reduce overfitting .
But in NG sir`s lecture he taught us to drop off the neurons not the input features .
Here is the image see we actually drop off the neuron.
But what is a^{[3]}? Isn’t it the sample matrix? if d3
is multiplied to a3
, isn’t it dropping feature values?
Let’s stick with the lecture and not the assignment for now.
Based on the lecture and my above comment, would you like to rewrite your understanding, and we look at the latest writeup?