Week 3 quiz question 8

1157350959 · July 18, 2021, 8:38pm

In this question, it said that batch normalization can’t be used on single gradient descent. Did Professor mention that in the lectures? It also occurred to me that is there any other similar restrictions like momentum can only be applied to mini-batch gradient or RMSprop and Adam can not be used for single gradient descent? Could anyone help me to systemize on which optimizers can be applied to which gradient descent?

nramon · July 19, 2021, 10:20am

Hi, @1157350959.

I don’t think it was explicitly mentioned in the lectures.

Batch Normalization computes the mini-batch mean and variance for each activation, and that doesn’t make much sense if there’s a single example per mini-batch, does it?

Analogously, batch size plays an important role in optimizer performance, but I don’t think there are any restrictions as such.

Good luck with the rest of week 3!

Topic		Replies	Views
Batch Normalization with Stochastic Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	550	February 27, 2022
Mini-batches and GD with Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	485	April 23, 2022
Batch Normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	552	May 31, 2021
Regularization and Adam and normalization and mini-batch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	519	April 23, 2022
DLS Course 2,Week2,Programming Assignment(Exercise 3 and Exercise 5) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	703	June 30, 2022

Week 3 quiz question 8

Related topics