Hi, @kzed.
Your understanding seems correct to me. For each training example in a mini-batch, you sample a thinned network by dropping out units (source).
If that were not the case, instead of an Nl
x m
matrix you’d sample an Nl
x 1 vector and broadcast it along the dimension of m
, right?
I hope you’re enjoying the course