Hello @soufg!
From your questions I think you already have a pretty good understanding !
Different initialized values are the exact reason for neurons to behave differently. I have written a post on how we can make sure neurons not be able to achieve different results by initializing them to the same values.
That is random. Here is a DLS Course 1 Week 3 lecture about it.
There are 2 videos (first, second) in a row (from DLS Course 2 Week 1) where the first one describes a well-known deep neural network problem and at the end of it, it will mention that we can address the problem by weight initialization which is then discussed in the second video.
I know those videos are from the DLS, but I trust you can just take away whatever you can take away from them
This page has a list of initializers which are mostly random initializers but in different ways of “randomness”. I am sharing that page to show you some names that you can refer to if you would like to do your own research. Particularly if you just look at the L.H.S. of the page which gives us a glance of those names as below
There are, for example, 4 random initalizers that take the name “Uniform” which means they generate random numbers with a uniform probability distribution over different ranges. Among those, GlorotUniform
is the default choice for some tensorflow layers, however, it does not mean that there is one initializer which is always superior to the rest. They have their own origins but for that it is going to take some research to find out the information. However, on the page, if you click into some of the initializers, you might be able to see some more detailed information about the initializers themselves, for example, some formula and links to their reference papers.
Cheers,
Raymond