Should I use “He” weight initialization technique ONLY with very deep neural networks (10-150 layers) or should I use it even for the neural networks that have 2-4 layers? Thanks for the clarification.
It is not limited to deep networks.
As Tom says, this technique is generally applicable. There is no single initialization algorithm that works best in all cases, but He initialization is one of the first to try and works well in many cases even when the network is not deep.
In fact, we saw a concrete example of this in DLS C1 W4 A2. Take a look at how they did the initialization there for the 4 layer network: they actually use a variant of He or Xavier initialization. That is because the simple initialization that they had us build in the previous assignment gives really terrible convergence in that particular case. You can give it a try both ways and see a very concrete example of how the more sophisticated init algorithm can help even in a relatively shallow network.