Derivative of Relu in output layer

paulinpaloalto · November 21, 2022, 5:16am

Yes, that’s basically right. The only caveat is that you have to be a little more precise about matching the way that Prof Ng expresses things. There is the “loss” function L which gives a vector value with the loss for each sample. Then there is the “cost” function J which is the average of the loss values across the samples in the training set. The way Prof Ng decomposes things, he only uses J at the very final step where he computes the gradients of the weight and bias values. Everywhere else, he is computing “Chain Rule” factors. Notice again for the third time what the notation dAL means: it is the derivative of L, not of J, so you don’t have the summation and you don’t have the factor of \frac {1}{m}.

L(Y, A^{[L]}) = \displaystyle \frac {1}{2} (A^{[L]} - Y)^2

\displaystyle \frac {\partial L}{\partial A^{[L]}} = (A^{[L]} - Y)

If you wanted to compute the partial derivative of J, it would be:

\displaystyle \frac {\partial J}{\partial A^{[L]}} = \frac {1}{m} \sum_{i = 1}^m (A_i^{[L]} - Y_i)

But that is not really what we need to plug into the way Prof Ng has structured all the layers of functions here.

I am taking advantage of the feature of formatting LaTeX expressions here on Discourse. That was explained on the DLS FAQ Thread. Of course that assumes you are familiar with LaTeX, which is a language Prof Donald Knuth invented for formatting mathematical expressions. If that is new to you, just google “LaTeX” and you’ll find plenty of useful info.

Topic		Replies	Views
Spikes in cost function plot for deep "relu" nn Neural Networks and Deep Learning coursera-platform	24	771	November 4, 2021
Feedforward Neural Networks in Depth Deep Learning Resources coursera-platform	69	105072	September 20, 2025
Having some trouble on week 2 lab Neural Networks and Deep Learning week-module-2 , coursera-platform	21	48	February 8, 2025
What is the Cost Function for Softmax? Advanced Learning Algorithms week-module-2	121	439	May 18, 2025
Week 2 Logistic Regression with a Neural Network Mindset Neural Networks and Deep Learning coursera-platform	25	863	March 27, 2025

Derivative of Relu in output layer

Related topics