Cost = Nan

okwyyy · January 18, 2025, 1:24pm

During the development of a logistic regression model and implementation of gradient descent with a randomly sourced online dataset, the cost function yielded NaN values after a certain iteration count. How can this issue be addressed, given my foundational knowledge of logistic and linear regression?

paulinpaloalto · January 18, 2025, 7:39pm

The cost will become NaN if your \hat{y} value rounds to exactly 0 or 1. Of course \hat{y} is the output of the sigmoid function, so it can never be exactly 0 or 1 if we are doing pure math using \mathbb{R}. But in floating point, everything is an approximation and we can end up with exactly 0 or 1. You have several approaches to deal with that:

The first approach is to understand in more detail what is happening. E.g. instrument your code to track how close the values are getting to 0 or 1. In 64 bit floating point, I think z > 35 is enough to give you sigmoid(z) = 1. Maybe you need to use a smaller learning rate or a smaller iteration count. Of course it also matters what the accuracy of your predictions is.
You can actually put a defense mechanism into your cost logic to protect against the rounding to 0 or 1. Here’s a thread which discusses that in more detail.

paulinpaloalto · January 18, 2025, 7:43pm

Note if you read that thread that I linked all the way through, you’ll see that the problem in that case was not “saturation” but that some of the labels included in the training data were not 0 or 1 (the value 2 was included). So there can also be just plain bugs that will cause this NaN issue. It is worth thinking about that as well.

muralimarimekala · January 20, 2025, 3:17am

The occurrence of NaN values in the cost function during gradient descent is typically due to numerical instability or improper input handling. To address this, ensure your features are normalized or standardized to prevent scale-related issues, and use a lower learning rate to avoid large gradient updates. Inspect the dataset for missing, infinite, or invalid values, and handle them appropriately. Regularization (L1 or L2) can prevent weights from becoming excessively large, while gradient clipping can mitigate exploding gradients. Implement numerically stable functions, such as a reformulated sigmoid, to avoid overflow in exponential calculations. Debug by logging intermediate values of the cost function, gradients, and weights to pinpoint issues. Lastly, consider leveraging robust optimizers from libraries like TensorFlow or PyTorch to minimize errors in manual implementation. These steps collectively should resolve the instability.

okwyyy · January 23, 2025, 6:08am

Thank you.

okwyyy · January 23, 2025, 6:08am

Thank You.

Topic		Replies	Views
Logistic Regression cost function with rounded off Sigmoid calculations Neural Networks and Deep Learning	5	700	April 6, 2022
Course 1 Week 2 A2 np dot leads to nan Neural Networks and Deep Learning	3	502	July 14, 2023
Getting -inf and nan as cost value in some iterations Neural Networks and Deep Learning	2	538	July 11, 2021
Cost function problem Neural Networks and Deep Learning	19	854	August 16, 2023
Loss function for logistic regression Neural Networks and Deep Learning	2	605	December 28, 2021

Cost = Nan

Related topics