Cost function problem

paulinpaloalto · March 13, 2022, 7:20pm

There is one further point worth making here:

Even if you don’t fix your cost function to handle this case, it does no harm to the actual back propagation process. It may not be a priori obvious, but if you take a careful look at the back propagation logic, you will see that the actual scalar J value is not used anywhere. All we need are the gradients (derivatives) of J w.r.t. the various parameters. Because of the nice way that the derivative of sigmoid and the derivative of cross entropy loss work together, the vector derivative at the output layer ends up being:

dA^{[L]} = \displaystyle \frac {\partial L}{\partial A^{[L]}} = A^{[L]} - Y

so you can see that nothing bad happens to the derivatives if any of the A^{[L]} values are exactly 0 or 1.

In reality, the J value itself is really only useful as an inexpensive proxy for how well your convergence is working. You could also check that by computing the training accuracy periodically, but the code to do that is a bit more complicated (e.g. if you’re using regularization, you have to disable that for the predictions to evaluate the accuracy) and computationally expensive.

Topic		Replies	Views
Problem in Exercise 5 Week2 Neural Networks and Deep Learning coursera-platform	4	529	October 31, 2022
Logistic Regression cost function with rounded off Sigmoid calculations Neural Networks and Deep Learning coursera-platform	5	708	April 6, 2022
C1_W3_Logistic_Regression_Potential problem Supervised ML: Regression and Classification week-module-3	4	661	July 18, 2022
Nan in week 3 assignment Neural Networks and Deep Learning coursera-platform	10	693	May 2, 2021
Course 1 Week 3 program mini assignment Neural Networks and Deep Learning coursera-platform	8	575	June 12, 2021

Cost function problem

Related topics