I didn’t understand in the first week and first assignment when they explain why it is not good to use weights to zero. What I dint understand it this : “you get the same loss value for both, so none of the weights get adjusted and you are stuck with the same old value of the weights.” But we adjust the weights with the derivative of the loss with respect to them but not with the value of the loss.
But the point is the derivatives turn out to be zero in that case. Remember that the gradients are the derivatives of the cost w.r.t. the parameters. If the cost (average of the loss) is not changing, then that means the derivatives are zero, right? As with everything, it always goes back to the math. Here’s a thread which discusses Symmetry Breaking in more detail. It also explains why Symmetry Breaking was not required in the Logistic Regression case.
PS that thread was linked from the FAQ Thread, which is also worth a look just on General Principles.
Thanks for the reply!