Is it possible that the loss value be in the range of 1660?

Beeta · February 16, 2024, 9:29pm

Hello everyone,

AI newbie here. I am trying to calculate the weighted loss for all 14 classes (the provided dataset in week 1) so I have defined a function that would accept as many classes (disease) that is present in the dataset. After running it, I got the value 1660.926470376899.

Just wondering if this could be the case? I have double checked my code and all seem to be correct. Any thoughts?

Many thanks,
Beeta

Deepti_Prasad · February 16, 2024, 10:48pm

I surely want to know which disease you are talking about

And how you classified 14 classes in your model?

Or if you are talking about week 1 assignment?

Beeta · February 17, 2024, 4:55am

Thank you so much for your response. As mentioned before I am a novice in AI so there is a good chance I have made a mistake. The dataset I use is the same dataset they have provided for week 1 lab (AI for medical diagnosis). This dataset has 14 classes (diseases) and contains 1000 samples (images). However, the dataset is imbalanced. This is not part of the assignment and I just did this out of curiosity. Here are the steps I have done:

1- create a function for calculating the weighted loss

def weighted_binary_crossentropy(y_true, y_pred, w_p, w_n):

2- load the dataset

train_df = pd.read_csv("../input/arxiv-org-abs-1705-02315/Files/home/jovyan/work/data/nih/train-small.csv")

3- remove columns {‘Image’, ‘Patient Id’}
So after removing my df contains only 14 columns (diseases) and 1000 rows, all values are either 0 or 1

4- calculate y_true (below is the output)

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

5- calculate y_pred

Shape of y_pred: (1000, 14)
Example of y_pred:
[[0.106 0.02  0.033 ... 0.021 0.01  0.038]
 [0.106 0.02  0.033 ... 0.021 0.01  0.038]
 [0.106 0.02  0.033 ... 0.021 0.01  0.038]
 ...
 [0.106 0.02  0.033 ... 0.021 0.01  0.038]
 [0.106 0.02  0.033 ... 0.021 0.01  0.038]
 [0.106 0.02  0.033 ... 0.021 0.01  0.038]]

6- calculate w_p

array([0.894, 0.98 , 0.967, 0.984, 0.872, 0.987, 0.986, 0.998, 0.825,
       0.955, 0.946, 0.979, 0.99 , 0.962])

7- calculate w_n

array([0.106, 0.02 , 0.033, 0.016, 0.128, 0.013, 0.014, 0.002, 0.175,
       0.045, 0.054, 0.021, 0.01 , 0.038])

8- calculate the weighted loss
1660.926470376899

Not sure if sharing the whole code is allowed. If it is, I am happy to share the code here. Thanks so much for your time and your professional help.

Deepti_Prasad · February 17, 2024, 8:06am

Sharing a non assignment codes is allowed. Good that you are working with the dataset in different ways. Surely loss of 1699 tells your model needs a lookout. Can you elaborate more on the codes on how you applied algorithm? Also you applied binary crossentropy, I suppose that is for if disease present and if not?

Regards
DP

Beeta · February 17, 2024, 4:51pm

Hi Deepti,

Yes, you are right, I have applied binary crossentropy. I have been thinking about my code and I think I’ve figured what is wrong here. I think I should have calculated the weighted loss for each class (column) separately, not one loss for all. Could that be the reason I got this 1660? This dataset is imbalanced, e.g. 2 samples for disease a and 100 samples for disease b. I’d be grateful if you could please take a look at my code here and provide some guidance. Many thanks!

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set()

from keras.preprocessing.image import ImageDataGenerator   

# Define the weighted binary crossentropy loss function for multiple classes
def weighted_binary_crossentropy(y_true, y_pred, w_p, w_n):
    loss = 0
    num_classes = len(w_p)
    for i in range(num_classes):
        loss_pos = -1 * np.sum(w_p[i] * y_true[:, i] * np.log(y_pred[:, i]))
        loss_neg = -1 * np.sum(w_n[i] * (1 - y_true[:, i]) * np.log(1 - y_pred[:, i]))
        loss += loss_pos + loss_neg
    return loss
train_df = pd.read_csv("../input/arxiv-org-abs-1705-02315/Files/home/jovyan/work/data/nih/train-small.csv")
columns_to_remove = ['Image','PatientId']
new_train_df = train_df.drop(columns=columns_to_remove, inplace=False)
new_train_df.head()
y_true = new_train_df.values
y_true
y_pred = np.zeros_like(y_true, dtype=float)
y_pred
# Calculate the proportions of positive labels (1s) for each class
proportions = new_train_df.mean(axis=0)  # Calculate the mean along each column

# Use these proportions as the predicted probabilities
predicted_probabilities = proportions.values

# print("Predicted Probabilities:")
# for i, prob in enumerate(predicted_probabilities):
    # print(f"Class {i}: {prob:.4f}")
    
# Repeat the predicted probabilities for each row to match the shape of y_true
y_pred = np.tile(predicted_probabilities, (len(new_train_df), 1))

print("Shape of y_pred:", y_pred.shape)
print("Example of y_pred:")
print(y_pred)
w_p = np.sum(y_true == 0,axis=0) / y_true.shape[0]
w_p
w_n = np.sum(y_true == 1, axis=0) / y_true.shape[0]
w_n
weighted_binary_crossentropy(y_true, y_pred, w_p, w_n)

Attaching this screenshot below just to show how imbalance the dataset is:

Deepti_Prasad · February 17, 2024, 4:55pm

What was the result of this update?

Beeta · February 17, 2024, 6:13pm

array([223.29863812,  77.07162397, 109.92715643,  65.35800326,
       244.73938416,  55.89043949,  59.11972805,  12.40835376,
       279.41488542, 135.24820087, 151.93832091,  79.86054073,
        45.69068317, 120.96051204])

Beeta · February 17, 2024, 10:10pm

I think calculating losses individually makes more sense. Is that correct?

Deepti_Prasad · February 18, 2024, 11:44am

Honestly Beeta if you ask me the assignment week 1 has a similar application of weighted loss.

the only difference you are applying is probability values.

what you could do use the dataset from week 1 as validation dataset, and find more similar data and apply that data on the model already being created in week 1.

Regards
DP

Topic		Replies	Views
Calculate the loss for class (column) 1 AI for Medical Diagnosis week-module-1	7	214	April 13, 2024
Could you explain the binary loss function in the code in relation to the one explained in the video? AI for Medical Diagnosis week-module-1	1	518	January 27, 2023
W1 assignment 1 AI for Medical Diagnosis week-module-1	2	610	November 23, 2021
C1W1_Got wrong answer for weighted loss AI for Medical Diagnosis week-module-1	2	664	June 12, 2022
AI4M C1W1 Weighted_loss "All test passed" data type issue AI for Medical Diagnosis week-module-1	2	573	June 29, 2022

Is it possible that the loss value be in the range of 1660?

Related topics