Zeros initialization for weights matrices

mohamed_abd_elhamid · April 6, 2022, 10:12pm

I do not understand why gradient descent does not work when the matrix is initialized with zeros. I understand from the first assignment that the the layers output value (z = 0) which then is passed to the last layer (sigmoid layer) and the activation then is sigmoid(0) which is equal to 0.5.

And I also understand that the loss function will always output the same value regardless of the truth value of the training example.

What I do not understand is that the general form of gradient descent is (W = W - learning_rate * dw). And in the case of sigmoid function (dw = 1/m * np.dot(X, (A-Y).T))

The vector A should be a column vector of the value 0.5 and Y is a column vector of ones and zeros so in general dw should not be all zeros.

What am I missing?

paulinpaloalto · April 7, 2022, 12:10am

Here is a thread which discusses in detail why “symmetry breaking” is required for neural networks, but not for Logistic Regression. Please have a look and then ask any followup questions here (that thread is a reference thread, so it’s “closed”).

Topic		Replies	Views
Symmetry Breaking versus Zero Initialization Neural Networks and Deep Learning week-3	7	8710	January 5, 2022
Week 1, Programming Assignment initialization, Exercise 1 - initialize_parameters_zeros Improving Deep Neural Networks: Hyperparameter tun	8	829	December 15, 2023
Initializing weights with 0 Neural Networks and Deep Learning	3	640	August 21, 2022
Zero initialization of weights Improving Deep Neural Networks: Hyperparameter tun	2	668	October 27, 2021
Weight matrix initialization Neural Networks and Deep Learning	2	705	July 20, 2021

Zeros initialization for weights matrices

Related topics