Shallow NN vs. Logistic Regression

In week 2 of the course, Andrew shows a shallow NN which has 1 hidden layer of 4 units.
As I understand, each unit performs the actions of a single logistic regression.
I wish to understand how the information from 4 units is then processed to the output layer:

  1. If each of the 4 units produces the optimal parameters w and b, why would they differ?
  2. If they differ, how is the output layer then, uses all 4 different parameters from the 4 units?

Because each unit captures different patterns in the data.

Vectorization. You will learn more about it as you proceed with the course.