A doubt on dropout

BTW there have been lots of threads about dropout over time.

Here’s one that discusses the point that the way we implement it, each sample is handled differently in each minibatch.

Here’s one that explains the point of the “inverted” dropout. You have to read all the way to the end of the thread to see references to the fact that the “inverted” algorithm is actually a more sophisticated one that actually wasn’t in the original dropout paper from Hinton’s group.

Here’s one that discusses the question “if dropout works, then doesn’t that mean there’s a smaller network I could define that would have the same effect”.