Hi, I am not sure why the second option should not be selected. I thought each hidden layer does have its own beta and gamma. Could someone help me understand why it is not correct? Thanks!
Typically, each batch norm layer has separate beta and gamma parameters for each channel, rather than a one global value of beta and gamma.
