(The post has been removed by Admin)

Hi, @NocturneJay.

Would you mind editing your post so that it doesnâ€™t give away the answer? Iâ€™ll try to answer in the same way.

I think what youâ€™re missing is that a pair of parameters Gamma and Beta are learnt for each output. They are indeed represented as vectors in some of the lectures, which is probably what confused you:

You may want to take a look at page 3 of the Batch Normalization paper if itâ€™s still not clear.

Let me know if that helped

Sorry for leaking the answer, but it seems that I can not edit my post anymore.

Thanks for your reply. @nramon

What really confuses me is that for a layer, letâ€™s say with 3 hidden units and batch size of m, what are the shape of Gamma and Beta, respectively.

From the paper you referred to (Page 3 Algorithm 1), I think after normalization, X ~ N (0, 1) (where N(0, 1) refers to a standard normal distribution), so to make the batch of x have the same distribution, each component (or scalar) of Gamma or Beta should be the same. Am I right?

No worries, @NocturneJay.

In that case, the hidden layerâ€™s output would have shape *(3, m)*, right? Gamma and Beta would both have shape *(3, 1)*, regardless of *m*, and would be â€śstretchedâ€ť to compute the element-wise product and the addition through broadcasting.

I think youâ€™ll understand it better if you play with the following code:

```
>>> import numpy as np
>>> gamma = np.random.rand(3, 1)
>>> beta = np.random.rand(3, 1)
>>> m = 1
>>> z_norm = np.random.randn(3, m)
>>> z_tilde = gamma * z_norm + beta
>>> m = 32
>>> z_norm = np.random.randn(3, m)
>>> z_tilde = gamma * z_norm + beta
```

Yes, after normalization the outputs are distributed as *N(0, 1)*. Gamma and Beta precisely allow them to have different distributions if thatâ€™s the optimal thing to do!

Thank you so much @nramon. Code really helps.

I think I used to be stuck in the opinion that after batch norm, \tilde z s should have the same distribution. But actually different Gamma and Beta can change each outputâ€™s distribution.

Exactly!

Glad I could help