Do we choose cost function only according to the output layer

Huanhuan_Zhao · August 7, 2022, 8:32pm

In the neural network, we can define different types of activation functions for different layers, do we choose the cost function only according to the output layer? How will the cost function work for the hidden layers?

shanup · August 7, 2022, 8:44pm

Hello @Huanhuan_Zhao

Welcome to the community.

Yes, the cost function is evaluated only at the output layer. There is no cost function for the hidden layers.

Huanhuan_Zhao · August 7, 2022, 9:04pm

I see. But how do the w and b in the hidden layers get adjusted then?

shanup · August 7, 2022, 9:08pm

@Huanhuan_Zhao

We call it backpropagation. Its essentially the Chain rule in Calculus, using which we find the derivative of J w.rt to the w and b of every neuron in the hidden layers.

Once we have the \frac {\partial J} {\partial w}, \frac {\partial J} {\partial b} for every neuron in the hidden layers, we can apply the standard update equation w = w - \alpha * \frac {\partial J} {\partial w}

Elemento · August 8, 2022, 11:06am

Hey @Huanhuan_Zhao,
Welcome to the community. As Shanup pointed out, the cost function is indeed evaluated at the output layer.

However, the above statement might not be correct completely. We choose the cost function according to our form of predicted labels and true labels, and pretty much the output layer is also decided in the same manner.

For instance, consider a regression task, in which a single unbounded numerical value is to be predicted. In this case, we will choose a dense layer with a single neuron (you may or may not use an activation function) as the output layer, and it will predict a single numerical value. So, both the true and the predicted labels are single numerical values, and in this case, we can use loss functions like tf.keras.losses.MeanAbsoluteError and tf.keras.losses.MeanSquaredError .

Similarly, consider a classification task. Going along the same lines, we will use a dense layer with appropriate number of units (with linear or softmax activation), and in this case, we can use loss functions like tf.keras.losses.CategoricalCrossentropy and tf.keras.losses.SparseCategoricalCrossentropy.

I hope this helps.

Cheers,
Elemento

Huanhuan_Zhao · August 30, 2022, 12:49pm

Thank you Shanup, after taking the Neural Network and Deep Learning courses, I understand your answer (the backpropagation ) now.

shanup · August 30, 2022, 1:00pm

Glad to hear that @Huanhuan_Zhao

Topic		Replies	Views
How does TensorFlow compute cost in hidden layers? Advanced Learning Algorithms week-1	16	883	June 14, 2023
Hidden layers in deep neural network Neural Networks and Deep Learning	1	576	June 27, 2021
Week4- assignment 2- Difference in gradient calculation for the last layer activation in neural networks Neural Networks and Deep Learning	2	677	May 17, 2023
What is the role of ReLu derivative? Neural Networks and Deep Learning week-3	3	283	May 4, 2024
C2_W2_Multiclass_TF Advanced Learning Algorithms week-2	9	484	April 3, 2023

Do we choose cost function only according to the output layer

Related topics