In week 2 of the course, Andrew shows a shallow NN which has 1 hidden layer of 4 units.
As I understand, each unit performs the actions of a single logistic regression.
I wish to understand how the information from 4 units is then processed to the output layer:
i.e.,
- If each of the 4 units produces the optimal parameters w and b, why would they differ?
- If they differ, how is the output layer then, uses all 4 different parameters from the 4 units?