Confusion Matrix Accuracy Problem

Chiang_Yuhan · November 2, 2023, 3:28am

I have trained a neural network and I wanted to plot a confusion matrix. I have found the confusion matrix didn’t add up to the accuracy from my calculations. I personally think that I have used the wrong method to load the “true labels”. Are there any better solutions to debug this problem? (I used the image set from directory method from keras)

I used this code to load the ‘true labels’ of my dataset:

y_true = tf.concat(list(train_dataset.map(lambda s,lab: lab)), axis=0) #I’m not sure if it is correct

I used this code to extract the predicted labels:

y_pred = model.predict(train_dataset)
y_pred = tf.argmax(y_pred, axis=1)

TMosh · November 2, 2023, 4:52am

Were the true labels not provided with the dataset directly?

From the confusion matrix, your correct predictions are really bad (adding all the values on the main diagonal, it’s less than 20%). It’s like your network isn’t learning anything.

But that’s inconsistent with the training accuracy graph, which says the training accuracy is about 65%.

Chiang_Yuhan · November 2, 2023, 6:42am

I am aware of the inconsistency. So I suspect there’s something wrong happened when I read the labels. I used the image_set_from_directory method from Keras and I’m not really aware on how it is arranged. Every time I run model.predict( ) the predicted labels are different. Is there any way that I can counter this issue?

TMosh · November 2, 2023, 6:54am

Can you inspect the raw data set somehow, like with a text editor for the labels?

I don’t know where you got the data set, but often they will be provided with a description of the encoding, so you know how to read the data.

Chiang_Yuhan · November 2, 2023, 7:18am

It is an in-house dataset that I’ve made myself. I used the keras function to load my images, and it should automatically assign each image to its correct class. But since I use batch training, every time I inspect the labels it shows different values. Are there better strategies for me to extract the labels and build a confusion matrix? I’m open for any suggestions!

Chiang_Yuhan · November 2, 2023, 7:19am

I’m also really not familiar with the Dataset object in Keras.

Girijesh · November 2, 2023, 7:45am

Dear @Chiang_Yuhan,

Please go https://keras.io/ and there you will find the documentation along with the examples.

https://keras.io/examples/

rmwkwok · November 2, 2023, 8:03am

Hi @Chiang_Yuhan,

If you can share the full code and data (zipped into one file) with me, I can take a look for you.

I suggest to share code on Git (you may set it to a private repo and add me to it - my handle is rmwkwok). No preference on data, but if you share via Google Drive, my handle is rmwkwok.

Raymond

Chiang_Yuhan · November 2, 2023, 8:43am

Just added! Much thanks!
It’s in jupyter notebook format, the following is the link

Chiang_Yuhan · November 2, 2023, 9:08am

I also tried to create a copy of the original train dataset and check the labels, I found that whenever I use this line:

train_label = np.concatenate([y for x, y in train_dataset], axis=0)

the labels are quite different.

rmwkwok · November 2, 2023, 9:26am

@Chiang_Yuhan

Where did you share the data?

rmwkwok · November 2, 2023, 9:31am

@Chiang_Yuhan

I will be available in the following 20 minutes, otherwise, I can look at it again after 2 or 3 hours. It will be more efficient for both of us if everything is ready

I will wait for the data.

rmwkwok · November 2, 2023, 9:47am

Btw, @Chiang_Yuhan, if there is any reason you don’t want to share your data, you can tell me, and perhaps you can replace it with some public multiclasses data and rerun your notebook on it?

Chiang_Yuhan · November 2, 2023, 11:08am

Sorry for the late reply, and I appreciate your early reply!

I will share the data shortly!

Chiang_Yuhan · November 2, 2023, 11:23am

I just uploaded a sample of the data in a zip file, that should be enough for you to check the code.

Thank you again for helping me out sir. It means a lot to me!

Best,
Yuhan

rmwkwok · November 2, 2023, 12:37pm

Hello Yuhan @Chiang_Yuhan,

Where did you upload it to? I don’t see it in your Git repo, and I did not receive any notification from Google Drive.

Raymond

rmwkwok · November 2, 2023, 1:19pm

Hey Yuhan @Chiang_Yuhan,

I still have no idea where to find the data. We might be in different time zones, and I will check your message again tomorrow my time.

Raymond

rmwkwok · November 2, 2023, 11:13pm

Hello @Chiang_Yuhan,

Let me know if you have any difficulties uploading the data. We can make it happen together.

Because I cannot see it, but if you indeed have uploaded it, please share some screenshots here so that I can help figure out what happened. There must be some misunderstanding.

Raymond

Chiang_Yuhan · November 3, 2023, 1:56am

Hello @rmwkwok

I forgot to hit commit . Is uploading a zip file workable for you?

Thanks again for the huge help,

Yuhan

rmwkwok · November 3, 2023, 2:43am

Hey Yuhan @Chiang_Yuhan ,

Did you already know the reason of the problem ?

Because your observation had precisely explained that!

The first time you ran train_dataset to generate the y_pred, it had ordered the samples in one way; but then in the second time to generate y_true, it had ordered them in a different way, so y_pred and y_true will not match.

Anyway, here come two questions:

Why is the ordering random?
How to make sure y_pred and y_true match?

I suppose you, again, know how to find out the answers?

Raymond

Topic		Replies	Views
Confusion Matrix and Model Improvement AI Discussions	5	44	November 2, 2023
TF C3W3 assignment results too good Natural Language Processing in TensorFlow week-3	7	76	December 15, 2023
Over 90% accuracy but wrong predictions AI Discussions ai-discussions	14	991	April 16, 2024
Course1 Week 2 Accuracy Issue Introduction to TF for Artificial Intelligence ... week-2	2	510	July 12, 2022
What would be the cause of zero accuracies? Convolutional Neural Networks in TensorFlow week-4	3	520	June 7, 2022

Confusion Matrix Accuracy Problem

Related topics