We’ve seen initializing both weights and bias to zero leads to the same value of neurons in each layer. Can we initialize bias to some none zero terms and weights to zero? I think since bias are none-zero, the neurons in a layer are no longer symmetric.

Hi @rae and welcome to Discourse. The input to a layer is written as: z=Wx+b. If only b is non-zero, you would still get a symmetric response in each layer. The reason is that the output from each layer will be constant, which could be non-zero, but still the same within a layer. This would result in uniform updates of weights, and a sub-optimal training

Hi @yanivh, by symmetry response and constant output, do you mean symmetry and constant among all observations or among all neurons in the layer?

For example, W is a zero matrix and b is [1,2]^T, then the two neurons in the layer of Z are [1,2] for all observations, and this is not symmetric among the neurons.

@rae, you are correct. In your example symmetry is broken within a layer. In fact, you can think of b as weights associated with an input x=1 (it could have been represented as the first row in W, and for x you would add 1 as the first element).
So, taking back what I wrote before. Initializing b as non-zero would have a similar effect as W itself being non-zero. I wouldn’t practice DL this way, but theoretically it should work. You can try this yourself in one of the assignments in the course.