C2 W3 Normalizing Activations in a Network

Ammar_Jawed · August 28, 2023, 7:16pm

Q1. How do we initialize the values for Γ and β to find Ž(i)?

If we initialize Γ = sqrt(σ² + ɛ) and β = μ, then Ž(i) = Z(i) on the first step of forward propagation. Although, we’ll be updating parameters Γ and β, but it doesn’t make sense to me why would we do that.
If we initialize Γ = 1 and β = 0, then Ž(i) = Z(i)_norm on the first step of forward propagation.

Q2. To update the values of Γ and β, we take derivatives of Γ and β w.r.t Cost Function and then update the values, like we update the values for W and b. This means that the values of Γ and β will be updated to minimize the Cost Function, which is not the purpose of updating those values. Our purpose for updating the values was to fasten the process of gradient descent converging. How does updating the values of Γ and β help fasten the process of gradient descent?

Q3. What will be the dimensions of Γ and β, if we have m examples rather than 1 example?

paulinpaloalto · August 28, 2023, 11:23pm

That is not how the training of the Batch Norm parameters is done. Here’s the Keras documentation page about that. Rather than computing gradients w.r.t. Cost Function, it is computing the mean and standard deviation of the batches that it is seeing (during training) and then using those saved values (during inference). Although there are parameters you can use to control when the updating of the moving averages is recomputed. Please see the doc page for details.

The shapes of those parameters are independent of the batch size. It just uses broadcasting to do the elementwise operations across the “samples” dimension.

Topic		Replies	Views
How to initialize the gamma and beta parameters for batch norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	545	May 15, 2021
Question about DLS Course 2 weekly quiz Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	549	March 24, 2022
Batch Normalization Questions Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	415	September 15, 2023
Week 3 quiz - misleading question Improving Deep Neural Networks: Hyperparameter tun week-3 , coursera-platform	3	208	August 1, 2024
C2_W3_batch norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	543	September 8, 2022

C2 W3 Normalizing Activations in a Network

Related topics