Confusion Matrix Accuracy Problem

I have trained a neural network and I wanted to plot a confusion matrix. I have found the confusion matrix didn’t add up to the accuracy from my calculations. I personally think that I have used the wrong method to load the “true labels”. Are there any better solutions to debug this problem? (I used the image set from directory method from keras)

I used this code to load the ‘true labels’ of my dataset:

y_true = tf.concat(list(train_dataset.map(lambda s,lab: lab)), axis=0) #I’m not sure if it is correct

I used this code to extract the predicted labels:

y_pred = model.predict(train_dataset)
y_pred = tf.argmax(y_pred, axis=1)



1 Like

Were the true labels not provided with the dataset directly?

From the confusion matrix, your correct predictions are really bad (adding all the values on the main diagonal, it’s less than 20%). It’s like your network isn’t learning anything.

But that’s inconsistent with the training accuracy graph, which says the training accuracy is about 65%.

1 Like

I am aware of the inconsistency. So I suspect there’s something wrong happened when I read the labels. I used the image_set_from_directory method from Keras and I’m not really aware on how it is arranged. Every time I run model.predict( ) the predicted labels are different. Is there any way that I can counter this issue?

1 Like

Can you inspect the raw data set somehow, like with a text editor for the labels?

I don’t know where you got the data set, but often they will be provided with a description of the encoding, so you know how to read the data.

1 Like

It is an in-house dataset that I’ve made myself. I used the keras function to load my images, and it should automatically assign each image to its correct class. But since I use batch training, every time I inspect the labels it shows different values. Are there better strategies for me to extract the labels and build a confusion matrix? I’m open for any suggestions!

1 Like

I’m also really not familiar with the Dataset object in Keras.

1 Like

Dear @Chiang_Yuhan,

Please go https://keras.io/ and there you will find the documentation along with the examples.

https://keras.io/examples/

Hi @Chiang_Yuhan,

If you can share the full code and data (zipped into one file) with me, I can take a look for you.

I suggest to share code on Git (you may set it to a private repo and add me to it - my handle is rmwkwok). No preference on data, but if you share via Google Drive, my handle is rmwkwok.

Raymond

Just added! Much thanks!
It’s in jupyter notebook format, the following is the link

1 Like

I also tried to create a copy of the original train dataset and check the labels, I found that whenever I use this line:

train_label = np.concatenate([y for x, y in train_dataset], axis=0)

the labels are quite different.

1 Like

@Chiang_Yuhan

Where did you share the data?

1 Like

@Chiang_Yuhan

I will be available in the following 20 minutes, otherwise, I can look at it again after 2 or 3 hours. It will be more efficient for both of us if everything is ready :wink:

I will wait for the data.

1 Like

Btw, @Chiang_Yuhan, if there is any reason you don’t want to share your data, you can tell me, and perhaps you can replace it with some public multiclasses data and rerun your notebook on it?

1 Like

Sorry for the late reply, and I appreciate your early reply!

I will share the data shortly!

1 Like

I just uploaded a sample of the data in a zip file, that should be enough for you to check the code.

Thank you again for helping me out sir. It means a lot to me!

Best,
Yuhan

1 Like

Hello Yuhan @Chiang_Yuhan,

Where did you upload it to? I don’t see it in your Git repo, and I did not receive any notification from Google Drive.

Raymond

1 Like

Hey Yuhan @Chiang_Yuhan,

I still have no idea where to find the data. We might be in different time zones, and I will check your message again tomorrow my time.

Raymond

1 Like

Hello @Chiang_Yuhan,

Let me know if you have any difficulties uploading the data. We can make it happen together.

Because I cannot see it, but if you indeed have uploaded it, please share some screenshots here so that I can help figure out what happened. There must be some misunderstanding.

Raymond

2 Likes

Hello @rmwkwok

I forgot to hit commit :wink: . Is uploading a zip file workable for you?

Thanks again for the huge help,

Yuhan

1 Like

Hey Yuhan @Chiang_Yuhan ,

Did you already know the reason of the problem :stuck_out_tongue_winking_eye:?

Because your observation had precisely explained that!

The first time you ran train_dataset to generate the y_pred, it had ordered the samples in one way; but then in the second time to generate y_true, it had ordered them in a different way, so y_pred and y_true will not match.

Anyway, here come two questions:

  1. Why is the ordering random?
  2. How to make sure y_pred and y_true match?

I suppose you, again, know how to find out the answers?

Raymond

1 Like