Banana Classification - Help ):

Hi, I’m working on classifying whether bananas taste good or bad using the final code from week 4 of course 1. I’m trying to implement everything on my own to strengthen my intuition, but I keep encountering the same problem.
RuntimeWarning: divide by zero encountered in log cost = (-1/m) * (np.dot(Y, np.log(AL).T) + np.dot((1-Y), np.log(1-AL).T))
I know that you can’t calculate the log of 0, but the strange thing is that my code works perfectly fine when I switch from having a large training set: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

to

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42)

I have checked the training set for NaN values and I’m pretty sure that there aren’t a single one.

I did trace the problem to iteration 5 when using the test_size=0.2, but I’m not entirely sure why it happens there.

I was just wondering whether an angel who knows this better than I, could skim through my code and see whether there are some obvious flaws that might contribute to this error?

Here is a link to my collab:
Collab

and here is a link to the dataset I’m using:
Kaggle

Thx (:

Hello @Leonard_Aleksander_H,

An official way (in many ML packages) is to replace log(x) with log(x + ep) where ep is a small number like 1e-7, then we won’t have log(0).

A possible way to trace the problem, with your original larger training set, is, in your compute_loss function, to add a check like

if (AL < 1e-300).sum() > 0 or (1 - AL < 1e-300).sum() > 0:
    do something or raise an error to stop the training

Then you can catch the moment of the problem and start your investigation from there. You may replace 1e-300 with something closer to triggering the division by zero error.

Cheers,
Raymond

3 Likes

Thank you so much for the help!

The issue was that some AL values were so close to 1 (0.99999999) that Python interprets them as 1.0, creating an error when I try to compute log(1-AL).

I implemented your workaround by adding a small number, ep, which worked perfectly! My only issue now is that the cost went way up and got a bit unstable, from previously 100 to now: 500-600. I tried lowering ep, but that gave me an even higher cost. Do you know why this happens?

Best,
Leonard

I have a query based on the data frame, how is size, softness and harvest time in negative value for your classification?

Can you please inform about this data distribution?

Regards
DP

@Leonard_Aleksander_H I tried to make a post to you yesterday, but it seemed you had deleted it before I could finish responding.

So just FYI so you know you want to reconfigure your data load-- Or as an external user, the directory you provided for the CSV file doesn’t want to go.

To be completely honest I’m not sure the right way to structure this on Colab but… So you know.

He cannot delete your response as his trust level is regular.

1 Like

I also found that strange. To be completely honest, I picked a random dataset on Kaggle without much thought. I think the bananas in the dataset might be compared to a default banana, which could explain the negative values.

1 Like

You’ll have to download the dataset from the link i provided and edit that one line of code so it works for you google collab

1 Like

Okay I got the reason after referring the kaggle link provided by you, the values are negatives as they distribution value and not literal value as per size, harvest time and softness.

1 Like

I think there should be a way you can store/link to the file within Colab… But I don’t know what it is.

You should look into it.

1 Like

Any other changes besides adding ep? Was this more fluctuating result with 80% of your data as training data, or just 20%?

did you try dataset split of 5600 - 2400 i.r.t. train-test set? as you are using only two sets of data for your classification!!

@Leonard_Aleksander_H See instructions here:

A better experience for showing your project to others/the end user as they don’t have to download anything (which could be suspicious).

Thanks