This is an interesting point. Prof Ng is not that specific about this in the lectures, but you can clearly see in the notebook that they define the “drop” mask in such a way that not all samples in the given batch are handled the same way w.r.t. dropout. That is the implication of the fact that the dropout mask is the same shape as the output activation at the given layer. Here’s a thread which discusses this in more detail.
1 Like