Clarification of the Derivative of the Log Loss Function

Alessio_Brits · April 17, 2022, 2:53pm

After the Forward and Backpropagation video in Week 4, I was wondering why the derivative of the loss function for Logistic Regression is:

da^{[L]} = - \frac{y}{a^{[L]}} + \frac{(1-y)}{(1-a^{[L]})}

and not, as it was shown in the other videos I think, this:

da^{[L]} = a^{[L]}-y

I read through those extra articles in the notes and think I understand why, but I just wanted some clarification to make sure I fully understand.

Is it because the first equation above is the more generic derivative of the Log Loss function?

J = -(y \log{(a^{[L]})} + (1-y) \log{(1-a^{[L]})}

And the second equation is when we use the Sigmoid activation function:
Where:

a^{[L]} = \frac{1}{1+e^{-z^{[L]}}}

Does that mean if we use a different activation function for the final layer in a Classification problem which uses the Log Loss function as the cost, then we would need a different calculation for the derivative? (Or I guess we could just plug in the new activation function in the generic equation if we cannot find or derive a simplified version as we have for the Sigmoid).

paulinpaloalto · April 17, 2022, 3:14pm

Those are two different things:

The first formula you show is \displaystyle \frac {\partial L}{\partial A} which in Prof Ng’s notation he calls dA^{[L]}.

The second formula is \displaystyle \frac {\partial L}{\partial Z} which is dZ^{[L]}. It is the first formula times the derivative of sigmoid, so it’s one step of applying the Chain Rule.

Please see this thread for more information and the derivations.

You’re right that if you choose a different output layer activation or a different loss function, then you have to go back to the generic formulas and derive it all again. The reason it works out so nicely for us here is that the cross entropy loss function and sigmoid are a natural pair. The same is true once we get to multiclass classifications and use softmax plus cross entropy loss there.

Alessio_Brits · April 17, 2022, 3:25pm

Great, thank you for the clarification. It makes sense, I forgot about the Chain Rule in this case.

Topic		Replies	Views
Week 2 on logistic regression gradient descent Neural Networks and Deep Learning coursera-platform	7	644	January 27, 2022
Help on derivatives for the loss function Neural Networks and Deep Learning week-2 , coursera-platform	2	788	March 24, 2024
Confused in the gradient descent of the logistic log loss function Supervised ML: Regression and Classification week-3	9	698	January 12, 2023
Week 4 backward propagation da[l-1] derivation Neural Networks and Deep Learning coursera-platform	2	834	July 24, 2021
Back propagation derivatives Neural Networks and Deep Learning week-4 , coursera-platform	7	28	May 30, 2025

Clarification of the Derivative of the Log Loss Function

Related topics