What does this below statement meaning? Is something value of 2 should we tune here?
At lecture video 5:09mts , If you wish the variance here, this variance parameter could be another thing that you could tune with your hyperparameters. So you could have another parameter that multiplies into this formula and tune that multiplier as part of your hyperparameter search.
This is called He initialization, it is 2 times the Xavier initialization and works better for ReLu activation functions. Xavier is a good choice for tanh activation functions.
Most likely, you don’t need to tune either He or Xavier initialization. They work great for most problems with relu or tanh-similar activation functions.