In the neural network, we can define different types of activation functions for different layers, do we choose the cost function only according to the output layer? How will the cost function work for the hidden layers?
Hello @Huanhuan_Zhao
Welcome to the community.
Yes, the cost function is evaluated only at the output layer. There is no cost function for the hidden layers.
I see. But how do the w and b in the hidden layers get adjusted then?
We call it backpropagation. Its essentially the Chain rule in Calculus, using which we find the derivative of J w.rt to the w and b of every neuron in the hidden layers.
Once we have the \frac {\partial J} {\partial w}, \frac {\partial J} {\partial b} for every neuron in the hidden layers, we can apply the standard update equation w = w - \alpha * \frac {\partial J} {\partial w}
Hey @Huanhuan_Zhao,
Welcome to the community. As Shanup pointed out, the cost function is indeed evaluated at the output layer.
However, the above statement might not be correct completely. We choose the cost function according to our form of predicted labels and true labels, and pretty much the output layer is also decided in the same manner.
For instance, consider a regression task, in which a single unbounded numerical value is to be predicted. In this case, we will choose a dense layer with a single neuron (you may or may not use an activation function) as the output layer, and it will predict a single numerical value. So, both the true and the predicted labels are single numerical values, and in this case, we can use loss functions like tf.keras.losses.MeanAbsoluteError and tf.keras.losses.MeanSquaredError .
Similarly, consider a classification task. Going along the same lines, we will use a dense layer with appropriate number of units (with linear or softmax activation), and in this case, we can use loss functions like tf.keras.losses.CategoricalCrossentropy and tf.keras.losses.SparseCategoricalCrossentropy.
I hope this helps.
Cheers,
Elemento
Thank you Shanup, after taking the Neural Network and Deep Learning courses, I understand your answer (the backpropagation ) now.
Glad to hear that @Huanhuan_Zhao