I again have a small “complaint” about the way questions are posed and in this particular case I believe I am right :).
In question 6: "When using batch normalization it is OK to drop the parameter b(l) from the forward propagation since it will be subtracted out when we compute z_tilda = gamma*z_norm + beta ”(z_tilda - adds another variance and mean). But it is not true, because the b is subtracted in the previous step, when we compute the z_norm - so when we subtract the mean and divide by sigma (z_norm = (z-mu)/sigma).
However, when I chose “False”, it was marked as incorrect. Please, change the equation at the end of the question to z_norm, so it will not confuse other students . (I know b gets dropped out, but not because of z_tilda part, but because of the z_norm part).
Thanks for bringing this up. The staff have been informed about this.
\beta^{[l]} and \gamma^{[l]} are learnable parameters. If \beta^{[l]} = \mu and \gamma^{[l]} = \sqrt{\sigma^2 + \epsilon}, then we don’t need to worry about \beta^{[l]} when computing \tilde{z}^{[l](i)} . In the general case, we can’t ignore \beta^{[l]}.
Thank you for your answer. Maybe I did not write it clearly enough but I was not talking about \beta
and \gamma , but about the intercept (bias) “b” that is used for the calculation of z.
(sorry, I am not sure how to write in math-mode here)
Discourse supports latex via mathjax