Dropout - batchnorm or batchnorm - dropout : which order is the appropriate?

sequence model course 4, week 3, the last assignment : Trigger_word_detection_v2a

We use this model :

Screenshot from 2024-01-16 01-33-51
composed of the following 4 layers :

  • A convolutional layer
  • Two GRU layers
  • A dense layer.

There is one thing I do not understand
1 st layer :batchnorm, RelU, dropout. batchnorm is done before the dropout
2 nd layer : dropout done before batch norm
3 rd layer : dropout, batchnorm, dropout
When is dropout done before batchnorm ? when is done batchnorm before dropout ? When does the order matters ? for what goal ?

I know what batchnorm and dropout does and why they are also applied
I was able so far to get an intuition of the models, here I do not. Could someone provide an explanation (at least for the intuition please) ? Thank you already

There’s no rule regarding order of applying batch norm with respect to dropout. The author of the model might have found better performance via validation / test sets (please see courses 2 and 3 for a refresher on model evaluation).

I’ve usually seen BatchNorm being used BEFORE a computation related layer like Dense / Dropout / Conv. To get a better grip on using BatchNorm, look at state of the art (SOTA) models and observe how BatchNorm is used.