I decided to dive deeper into Xavier Initialization to trying an build a more methematical intuiton behind why the formlula works. In the paper of the creation of the formula it says that to compromise between 1/n_in and 1/n_out it just doe 2/(n_in+n_out) but why does it do this because its not and average or a harmonic mean it just says compromise ?

This might be a useful read:

If I just look at the equations presented and without other context, equation [12] seems to be the result of adding up equation [10] and equation [11]. You see, equations [10] and [11] imply two different variance values, so a compromise is like to assume a third and common value that â€śsatisfiesâ€ť both equations, and in doing so, it ends up as equation [12].

Cheers,

Raymond

But when the layers are diffrent sizes the wont the formula not work to just add up the two equations if n_l and n_l+1 are diffrent ? Or did the researchers just find out the the formula was good enough even when sizes didnâ€™t directly match up ?

You really need to ask the authors if you want to know their intention. We are all just making guesses.