Activation Functions, Weights, and Biases of Each Layer

Hi, for the sake of week 3, are the activation functions, weights, and biases of a given layer the same for all of the nodes/neurons of the layer such that the number of nodes/neurons doesn’t matter for the values of the matrix A that is inputted into the next layer? Also, are the inputs to the next layer’s nodes the same for all of those nodes? Thanks

I’m not sure I understand the question, but I’ll give it a try:

Yes, the way the layers of a neural network work is that each neuron in a given layer gets all the outputs of the previous layer. Prof Ng shows this in the diagrams when he explains how this works in Week 3. So far, so good.

But it is not true that the number of output neurons in a given layer has no effect on the output of that layer. It determines the shape of the A activation output value, right? If layer 1 has 20 neurons, then the shape of A^{[1]} will be 20 x m, where m is the number of samples. And note that it is critically important to understand that the output of all 20 of those neurons is different.

The key point is that even though each neuron in a given layer gets the same inputs as every other neuron, it has its own unique weights and bias values, so each neuron generates a different output with the same input. The reason for this is that we start with different values when we randomly initialize the weights. This is called Symmetry Breaking and Prof Ng discusses it in the lectures. Here’s a thread which goes into a bit more detail and shows some of the math. So the neurons start out with different weights and then learn different things through training using back propagation.

1 Like

One thing that I missed in my previous response:

It is also true that there is only one activation function at each layer. The same function is applied to each output neuron of the layer. Note that there is no requirement that all the hidden layers use the same activation function, but that is the way Prof Ng always seems to do it.

The activation at the output layer is not really a choice: if you are doing a binary (yes/no) classification it is always sigmoid. If you are doing a multi-class classification (cat, dog, horse, cow, elephant, zebra, …), then the output activation is softmax, which we will learn about in Course 2 of this series.