Hi, @Prasannab.
The scaling factor restores the expected values of the activations. Think about it in a probabilistic context. A contrived example with a small sample size probably won’t give you a good intuition.
Let me know if this explanation helps