Back propagation of last sigmoid layer

MustafaaShebl · September 17, 2024, 3:59am

I have a question, when we define dA[L] manually, using log loss ,we say dA[L] =- (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)), where AL is the output of last layer (y_hat), Isn’t it better to add epsilon (1e-10 for example) to AL to avoid division by zero if AL exactly equals 1 to get more computational stability or Am I getting something wrong?

TMosh · September 17, 2024, 4:46am

In theory you are correct.

In practice, AL will not reach exactly 0 or 1 except in the extreme limits. With trained weight values, this will not likely occur.

So this is not an issue you need to worry about very much.

MustafaaShebl · September 17, 2024, 5:06am

When I tried to practice to implement dropout regularization using the data given in the lab for week 1 course 2, it happened to me saying that Im trying to divide by zero when I debug it in both my code and the lab I saw that AL really contains 1s, but I didnt know how it was handled in the lab imported functions but when I added epsilon I get the same results, so I don’t know was it luck or it was correct?

[[9.99992040e-01 1.00000000e+00 9.99999988e-01 9.99999979e-01
9.99999983e-01 9.99521204e-01 9.65090199e-01 9.77719975e-01
9.99999351e-01 9.99999987e-01 1.00000000e+00 1.00000000e+00
9.99999998e-01 9.99999992e-01 9.98904808e-01 1.00000000e+00

This is a sample I saw in the lab.

TMosh · September 17, 2024, 1:36pm

You are not incorrect.

Sorry but i cannot explore this dataset at the moment. Perhaps another mentor will be able to run some tests and reply here.

paulinpaloalto · September 17, 2024, 7:29pm

Yes, you can run into problems if the sigmoid values round to exactly 0 or exactly 1. Of course mathematically, they would never exactly equal 0 or 1, but in floating point you can run out of resolution.

Here’s a thread which shows discusses the strategy you also mentioned of perturbing the values slightly to make sure you avoid the exactly 0 or 1 cases.

MustafaaShebl · September 17, 2024, 9:07pm

I get it now, it’s all about number approximation for float numbers which lead to 1 or 0 for computer representation.
Thank you very much for this clarification.

Topic		Replies	Views
Help needed: RuntimeWarning: divide by zero encountered in log Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	751	April 13, 2024
Regarding Logistic Regression Cost Function AI Discussions ai-discussions	6	134	June 19, 2025
Logistic loss function - divide by zero encountered in log Supervised ML: Regression and Classification week-module-3	3	1000	January 31, 2023
W2_A2_Ex6 optimizing log(0) error Neural Networks and Deep Learning coursera-platform	5	325	January 2, 2024
Logistic Regression cost function with rounded off Sigmoid calculations Neural Networks and Deep Learning coursera-platform	5	720	April 6, 2022

Back propagation of last sigmoid layer

Related topics