Question about DLS Course 2 week 3 quiz

NocturneJay · May 7, 2021, 2:40am

(The post has been removed by Admin)

nramon · May 7, 2021, 11:00am

Would you mind editing your post so that it doesn’t give away the answer? I’ll try to answer in the same way.

I think what you’re missing is that a pair of parameters Gamma and Beta are learnt for each output. They are indeed represented as vectors in some of the lectures, which is probably what confused you:

beta_update

You may want to take a look at page 3 of the Batch Normalization paper if it’s still not clear.

Let me know if that helped

NocturneJay · May 8, 2021, 3:43am

Sorry for leaking the answer, but it seems that I can not edit my post anymore.
Thanks for your reply. @nramon
What really confuses me is that for a layer, let’s say with 3 hidden units and batch size of m, what are the shape of Gamma and Beta, respectively.
From the paper you referred to (Page 3 Algorithm 1), I think after normalization, X ~ N (0, 1) (where N(0, 1) refers to a standard normal distribution), so to make the batch of x have the same distribution, each component (or scalar) of Gamma or Beta should be the same. Am I right?

nramon · May 8, 2021, 11:09pm

No worries, @NocturneJay.

In that case, the hidden layer’s output would have shape (3, m), right? Gamma and Beta would both have shape (3, 1), regardless of m, and would be “stretched” to compute the element-wise product and the addition through broadcasting.

I think you’ll understand it better if you play with the following code:

>>> import numpy as np
>>> gamma = np.random.rand(3, 1)
>>> beta = np.random.rand(3, 1)
>>> m = 1
>>> z_norm = np.random.randn(3, m)
>>> z_tilde = gamma * z_norm + beta
>>> m = 32
>>> z_norm = np.random.randn(3, m)
>>> z_tilde = gamma * z_norm + beta

Yes, after normalization the outputs are distributed as N(0, 1). Gamma and Beta precisely allow them to have different distributions if that’s the optimal thing to do!

NocturneJay · May 9, 2021, 1:56am

Thank you so much @nramon. Code really helps.
I think I used to be stuck in the opinion that after batch norm, \tilde z s should have the same distribution. But actually different Gamma and Beta can change each output’s distribution.

nramon · May 9, 2021, 11:18am

Exactly!

Glad I could help

Topic		Replies	Views
Course 2 week 3 quiz Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	632	November 4, 2021
Batch Normalization Questions Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	415	September 15, 2023
Learning beta and gamma in Batch Norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	628	September 4, 2022
Doubt regarding the dimensions of the parameters gamma and beta of Batch Norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	535	May 27, 2022
Week 3 quiz , question 8 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	591	September 15, 2021

Question about DLS Course 2 week 3 quiz

Related topics