I saw in programmic assigment, he initialization, but what is it?
… Glorot and Bengio [7] proposed to adopt a properly scaled uniform distribution for initialization. This is called “Xavier” initialization in [14]. Its derivation is based on the assumption that the activations are linear. This assumption is invalid for ReLU and PReLU.
In the following, we derive a theoretically more sound initialization by taking ReLU/PReLU into account. In our experiments, our initialization method allows for extremely deep models (e.g., 30 conv/fc layers) to converge, while the “Xavier” method [7] cannot.
…
my emphasis added above
ai_curious has given you the full reference, but this was also discussed in this lecture. Prof Ng explains the basis for Xavier Initialization, which is closely related and was the precursor to He Initialization.