Initial Parameter Values in Neural Networks (Deep Learning Special Course, Course 1 Week 3)

Yes, you are correct that you can “break symmetry” by making the W values constant and the b values random. My guess is that the reason the common practice is to use W as the random values is that it must give better convergence in most cases. You can try some experiments and see if you can see any difference. Here’s a thread from a while back that discusses Symmetry Breaking in more detail.

Note that there are a number of different possible random initialization algorithms. They show us a very simple one in Week 3 and Week 4 of Course 1. But it turns out those straightforward algorithms do not always work very well. Prof Ng will show us some more sophisticated initialization algorithms and discuss these issues in more detail in Course 2, so stay tuned for that. I point this out to give some background on my comment that there may be a reason for not using the bias values for symmetry breaking. Initialization matters for the performance of convergence and there is no single “silver bullet” solution that works best in all cases.