While talking about batch norm, Professor Andrew introduces two new parameters namely - gamma and beta, that basically allow us to control the mean and variance of the intermediary inputs (Z[l] values for any layer “l”). Now, while discussing the implementation of the same using Gradient Descent, Professor Andrew highlights the fact that these aforementioned parameters can be updated in the similar way as the weights of the neural network. Now, my question is how do we initialize these values?

Hi, @jaylodha.

TensorFlow initializes gamma to 1 and beta to 0 by default, but you can specify a different initializer for any of them. What works best may be problem specific.

2 Likes

Hey, thanks a lot for the reply.

1 Like