Clarification on Zero Initialization in Neural Network Linear Regression

It looks like you already found this thread, but I’ll add the link for anyone else who finds the current thread. It gives the math that hackyon mentions to show that zero initialization does not prevent learning in the logistic regression case. Based on that example, you can do the analogous derivation for linear regression. The cost function is different, of course. But once we get to real Neural Networks, we’ll need symmetry breaking as hackyon mentioned and that is also shown.

2 Likes