Banana Classification - Help ):

Leonard_Aleksander_H · May 18, 2024, 11:27am

Hi, I’m working on classifying whether bananas taste good or bad using the final code from week 4 of course 1. I’m trying to implement everything on my own to strengthen my intuition, but I keep encountering the same problem.
RuntimeWarning: divide by zero encountered in log cost = (-1/m) * (np.dot(Y, np.log(AL).T) + np.dot((1-Y), np.log(1-AL).T))
I know that you can’t calculate the log of 0, but the strange thing is that my code works perfectly fine when I switch from having a large training set: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

to

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42)

I have checked the training set for NaN values and I’m pretty sure that there aren’t a single one.

I did trace the problem to iteration 5 when using the test_size=0.2, but I’m not entirely sure why it happens there.

I was just wondering whether an angel who knows this better than I, could skim through my code and see whether there are some obvious flaws that might contribute to this error?

Here is a link to my collab:
Collab

and here is a link to the dataset I’m using:
Kaggle

Thx (:

rmwkwok · May 19, 2024, 6:48am

Hello @Leonard_Aleksander_H,

An official way (in many ML packages) is to replace log(x) with log(x + ep) where ep is a small number like 1e-7, then we won’t have log(0).

A possible way to trace the problem, with your original larger training set, is, in your compute_loss function, to add a check like

if (AL < 1e-300).sum() > 0 or (1 - AL < 1e-300).sum() > 0:
    do something or raise an error to stop the training

Then you can catch the moment of the problem and start your investigation from there. You may replace 1e-300 with something closer to triggering the division by zero error.

Cheers,
Raymond

Leonard_Aleksander_H · May 19, 2024, 10:17am

Thank you so much for the help!

The issue was that some AL values were so close to 1 (0.99999999) that Python interprets them as 1.0, creating an error when I try to compute log(1-AL).

I implemented your workaround by adding a small number, ep, which worked perfectly! My only issue now is that the cost went way up and got a bit unstable, from previously 100 to now: 500-600. I tried lowering ep, but that gave me an even higher cost. Do you know why this happens?

Best,
Leonard

Deepti_Prasad · May 19, 2024, 11:23am

I have a query based on the data frame, how is size, softness and harvest time in negative value for your classification?

Can you please inform about this data distribution?

Regards
DP

Nevermnd · May 19, 2024, 11:57am

@Leonard_Aleksander_H I tried to make a post to you yesterday, but it seemed you had deleted it before I could finish responding.

So just FYI so you know you want to reconfigure your data load-- Or as an external user, the directory you provided for the CSV file doesn’t want to go.

To be completely honest I’m not sure the right way to structure this on Colab but… So you know.

Deepti_Prasad · May 19, 2024, 12:04pm

He cannot delete your response as his trust level is regular.

Leonard_Aleksander_H · May 19, 2024, 12:14pm

I also found that strange. To be completely honest, I picked a random dataset on Kaggle without much thought. I think the bananas in the dataset might be compared to a default banana, which could explain the negative values.

Leonard_Aleksander_H · May 19, 2024, 12:15pm

You’ll have to download the dataset from the link i provided and edit that one line of code so it works for you google collab

Deepti_Prasad · May 19, 2024, 12:32pm

Okay I got the reason after referring the kaggle link provided by you, the values are negatives as they distribution value and not literal value as per size, harvest time and softness.

Nevermnd · May 19, 2024, 12:35pm

I think there should be a way you can store/link to the file within Colab… But I don’t know what it is.

You should look into it.

rmwkwok · May 19, 2024, 12:46pm

Any other changes besides adding ep? Was this more fluctuating result with 80% of your data as training data, or just 20%?

Deepti_Prasad · May 19, 2024, 12:47pm

did you try dataset split of 5600 - 2400 i.r.t. train-test set? as you are using only two sets of data for your classification!!

Nevermnd · May 19, 2024, 1:48pm

@Leonard_Aleksander_H See instructions here:

A better experience for showing your project to others/the end user as they don’t have to download anything (which could be suspicious).

Leonard_Aleksander_H · May 19, 2024, 3:07pm

Thanks

Topic		Replies	Views
Help needed: RuntimeWarning: divide by zero encountered in log Improving Deep Neural Networks: Hyperparameter tun	7	710	April 13, 2024
Divide by zero log error encountered when images dimension becomes larger Neural Networks and Deep Learning	5	547	October 5, 2021
Logistic loss function - divide by zero encountered in log Supervised ML: Regression and Classification week-3	3	937	January 31, 2023
Problem with Logistic Regression Supervised ML: Regression and Classification week-3	3	552	September 8, 2022
DLS Course 1 Week 2 warning when running the logistic regression model Neural Networks and Deep Learning	3	757	December 1, 2021

Banana Classification - Help ):

Related topics