Dropout Frequency

Hi, @kzed.

Your understanding seems correct to me. For each training example in a mini-batch, you sample a thinned network by dropping out units (source).

If that were not the case, instead of an Nl x m matrix you’d sample an Nl x 1 vector and broadcast it along the dimension of m, right?

I hope you’re enjoying the course :slight_smile:

1 Like