If the neurons in layer 1 are all performing logistic regressions on the same input data in parallel; why do they not come up with the exact same weights and biases? (for example if all the weights and biases were initialized to zero). In the example presented, there were already weights. So, I see the neural layers as a means of updating weights and biases based on new data. This is analogous to Bayeasian statistics where probabilites are updated by Bayes formula, but in Bayesean statitics one has to be concerned about the starting point (the Bayesean “prior”). This is the starting point issue we learned in gradient descent on steriods because we have all the neurons in layer 1 attepting to do a logistic regression in parallel and unless they all have unique starting points they should all produce the same weights and biases (from the equations it looks like the neurons in the same layer are strictly parallel and independent of each other thus there is no mechanism for them to specialize – short of the initialization values). Is this correct? Are the neurons in layer 1 randomly initialized?

Yes, all of the weights are randomly initialized.