But the point is the derivatives turn out to be zero in that case. Remember that the gradients are the derivatives of the *cost* w.r.t. the parameters. If the cost (average of the loss) is not changing, then that means the derivatives are zero, right? As with everything, it always goes back to the math. Here’s a thread which discusses Symmetry Breaking in more detail. It also explains why Symmetry Breaking was not required in the Logistic Regression case.

PS that thread was linked from the FAQ Thread, which is also worth a look just on General Principles.