Dropout - batchnorm or batchnorm - dropout : which order is the appropriate?

Nanini · January 16, 2024, 12:43am

sequence model course 4, week 3, the last assignment : Trigger_word_detection_v2a

We use this model :

Screenshot from 2024-01-16 01-33-51
composed of the following 4 layers :

A convolutional layer
Two GRU layers
A dense layer.

There is one thing I do not understand
1 st layer :batchnorm, RelU, dropout. batchnorm is done before the dropout
2 nd layer : dropout done before batch norm
3 rd layer : dropout, batchnorm, dropout
When is dropout done before batchnorm ? when is done batchnorm before dropout ? When does the order matters ? for what goal ?

I know what batchnorm and dropout does and why they are also applied
I was able so far to get an intuition of the models, here I do not. Could someone provide an explanation (at least for the intuition please) ? Thank you already

balaji.ambresh · January 16, 2024, 3:51am

There’s no rule regarding order of applying batch norm with respect to dropout. The author of the model might have found better performance via validation / test sets (please see courses 2 and 3 for a refresher on model evaluation).

I’ve usually seen BatchNorm being used BEFORE a computation related layer like Dense / Dropout / Conv. To get a better grip on using BatchNorm, look at state of the art (SOTA) models and observe how BatchNorm is used.

Victoria_Schroeder · January 8, 2025, 1:54pm

Hi

I had the absolutely same question. Would you recommend any literature on this topic ?

Cheers
Victoria

balaji.ambresh · January 8, 2025, 4:33pm

Topic		Replies	Views
Batch Norm/Drop out ordering Improving Deep Neural Networks: Hyperparameter tun	2	601	July 7, 2021
W3A2 Does the order of layers matter? Sequence Models	3	504	November 24, 2022
Where in the model to apply dropout layers? Convolutional Neural Networks	2	561	June 14, 2022
Week 3 Assignment 2 section 2.2.1 - Block training for Batch Norm layers Sequence Models	3	525	May 26, 2022
Is Batchnorm really necessary? Improving Deep Neural Networks: Hyperparameter tun	6	594	July 12, 2022

Dropout - batchnorm or batchnorm - dropout : which order is the appropriate?

Related topics