Yes, it’s an interesting point in how Prof Ng has us implement DropOut that in each minibatch every sample is handled differently w.r.t. the dropout mask. This was not clear from the lectures (or at least I don’t remember him discussing this point), but it’s very clear in the instructions in the assignment here. There has been some interesting previous discussion of this point, which is worth a look. That thread even includes some experiments with doing it both ways (same per sample or different per sample).
1 Like