Question about dropout regularization

This is an interesting question that has come up before. In the limit, if you were doing Stochastic GD, then there would be one sample in each minibatch so each sample would be handled differently w.r.t. dropout. The way Prof Ng has us do it has that behavior even if the batchsize is greater than 1. Here’s an earlier thread which discusses this point and even shows some experimental results with both methods.