Glorot Initializer for Bias?

Hi, in the week 3 programming assignment we initialize the bias variables using the Glorot Initializer instead of zeros as we have been doing up until this point.

Is there a reason for this? And are there any benefits to using zeros vs Glorot initialization?

This is an interesting point that I missed until you just pointed it out. It turns out that for Symmetry Breaking (which is required as explained on this thread and as we saw in the Initialization exercise in Week 1), it suffices to initialize either the weights or the bias values to be random and the others to be zero. But there is no harm in initializing both the weights and bias values to be random: you’ve still done the required Symmetry Breaking. Then the question is just whether you get better convergence when you use non-zero values for both and whether Glorot initialization is better than Xavier or He or any of the other possibilities. These are “hyperparameters” and the only way to know what works best is to try the various combinations.

It would be interesting to run the experiment here in this exercise: create another version of the initialization that uses tf.zeros for all the bias values and then compare the performance of convergence with the two styles of initialization. Then try some of the other algorithms besides Glorot and see what effect that has. Here’s the menu of possible initialization functions that TF provides.

Thanks for pointing this out and let us know if you try any such experiments and notice anything interesting. Science! :nerd_face:

1 Like