There is a lot to say here. For starters, here’s a pre-existing thread that discusses Symmetry Breaking that’s worth a look. One interesting point covered on that thread is that it’s not necessary for Logistic Regression, but it is once we go to real Neural Networks with more than just the output layer.
It also turns out that you need to break symmetry at all layers of the neural net. If you look at the formulas for how the gradients are computed, it’s the inverse of how forward propagation works. In forward prop, the input neurons of any given layer each get all the outputs of the previous layer and then each neuron has its own weights that they apply to those inputs. When we are going backwards, then all the gradients from the subsequent layers apply equally to all the neurons in the current layer. So if those start out the same, then they stay the same. So any layer that you don’t start with asymmetric weights will just stay symmetric, which defeats the purpose of having multiple neurons in that layer.
Of course all this is an experimental science. If you still have doubts and don’t want to actually do the calculus, it’s easy enough just to try the experiment you are suggesting. E.g. define a 3 layer network and start the middle layer symmetric, but randomly initialize the other two layers. Then run the training and watch what happens: print out the weight matrix for the initially symmetric layer after the training and see what it looks like.
The other high level point here (which was also mentioned on that other thread above) is that it’s not just setting all the weights to zero that’s the problem: setting them to any fixed value is the problem. There’s nothing magically bad about zero: it’s symmetry that’s bad.