Zeros initialization for weights matrices

I do not understand why gradient descent does not work when the matrix is initialized with zeros. I understand from the first assignment that the the layers output value (z = 0) which then is passed to the last layer (sigmoid layer) and the activation then is sigmoid(0) which is equal to 0.5.

And I also understand that the loss function will always output the same value regardless of the truth value of the training example.

What I do not understand is that the general form of gradient descent is (W = W - learning_rate * dw). And in the case of sigmoid function (dw = 1/m * np.dot(X, (A-Y).T))

The vector A should be a column vector of the value 0.5 and Y is a column vector of ones and zeros so in general dw should not be all zeros.

What am I missing?

Here is a thread which discusses in detail why “symmetry breaking” is required for neural networks, but not for Logistic Regression. Please have a look and then ask any followup questions here (that thread is a reference thread, so it’s “closed”).