Need Help: Adding Confusion Matrix to Mobilenet V2 Assignment

Greetings, fellow coders. I am trying to figure out ways to add a confusion matrix to the output of the Mobilenet-based classifier but haven’t been able to figure it out. Any help?

My code:

{moderator edit - solution code removed}

Hi, Ankit.

It looks like the only changes you have made so far as to convert the output from 1 unit (binary classification) to multi-class and added softmax as the output activation. There is some confusion about the loss functions: you import categorical_crossentropy, reference binary_crossentropy and then appear to actually use sparse_categorical_crossentropy, which is a slightly different (but related) thing. Note that you’ve also changed from from_logits = True mode to False mode, which is the default.

So where is the “confusion layer”?

The other bigger issue here is that I am uncomfortable with the fact that you have basically published your solution to the original problem in the exercise. I don’t think it is a good idea to do that. So if you want to talk about extending it, we need to figure out a way to talk about that without revealing your source code for the actual solution. I will edit your post to remove the source code.

Sorry if that’s not very much help, but we have to work within the rules here. Please take another shot at this, but let’s start by discussing what a confusion layer is, why you want to add it for the purposes of your goals here and then talk in more general principles about how to approach that. Note for starters that Prof Ng has not discussed “confusion layers” in Courses 1 through 4 here. I have not finished Course 5, so I don’t know if he discusses that there. So this question is beyond the scope of DLS Course 4, meaning that we need to start from something closer to “first principles” here. You can’t just assume the rest of us know what you are talking about. :nerd_face:

Hi Paul! Apologies for posting the code, and thanks a lot for removing it. Coming back to my trouble:
I am doing multiclass classification and therefore the loss is defined as sparse_categorical_crossentropy instead of binary_crossentropy. The latter is in the code but not being used.

Coming back to the first principles as you suggested, I want to see how well the model can classify the respective classes. Some of my classes are visually more similar to each other and I want to find out if these classes are causing the drop in the total training and validation accuracy. From my limited Google search, I found out that confusion matrix is the way to go but haven’t been able to integrate anything into my code. I am a biologist who’s new to programming so any help would be much appreciated!
Eager to hear some thoughts on this matter.

I have never looked at confusion layers before, so this will be a learning experience for both of us. :nerd_face: Well, I assume you found some articles about confusion layers. I’ll do my own google search in a second, but what do the articles say about how a confusion layer functions? They must have given some hints on this: e.g. does it act as a preprocessing layer, modifying the input data in some way like augmentation? Or is it an internal layer in the network that performs like another hidden layer? Or is it a post-processing layer that you apply to the output of your softmax layer? And what are the hyperparameters that you use to configure your confusion layer?

Or to put the question at a higher level: from the reading you have done so far, what is the point of a confusion layer and why do you think it will be useful in your application?

1 Like

From my current understanding, there are no confusion layers. The confusion matrix interprets the training output, and just like we plot training and validation accuracy, we can plot/print the confusion matrix, which shows accuracy and recall.
Relevant links: 1, 2

Ok, thanks for the links. So a “confusion matrix” is just a technique for evaluating the output of your model compared to the labels. It just gives you a convenient way to look at the metrics like precision and recall that are used to evaluate the performance of your model in a unified fashion.

So what you need to do is to convert the output of your model (softmax values) into the appropriate form to compute the confusion matrix. That is to convert from it the probability form of softmax to actual predictions. That’s pretty easy:

predictions = tf.math.argmax(activations, axis = -1)

That just computes the index that has the largest softmax output value for each input sample. It’s the multiclass equivalent of saying:

predictions = (activations > 0.5)

in the case of a binary classifier. Then there’s a TF function to compute the confusion matrix. Here’s the docpage. You feed both the predictions and the labels into it as 1D arrays of index values (not “one hot” vectors).

So that will give you the confusion matrix. Now the question is how to interpret the results and whether it actually tells you anything “actionable”.

Let’s try a little toy example to see what it looks like:

labels = tf.constant([0, 2, 1, 1, 2, 3, 3, 3, 2, 2, 0, 1])
preds  = tf.constant([0, 0, 0, 1, 2, 1, 1, 3, 3 ,2, 0, 0])

confusion = tf.math.confusion_matrix(labels, preds, num_classes = 4)
print(f"confusion matrix:\n {confusion}")

Running that gives this:

confusion matrix:
[[2 0 0 0]
 [2 1 0 0]
 [1 0 2 1]
 [0 2 0 1]]

Reading the documentation, the columns represent the predictions and the rows represent the true values. So the numbers on the main diagonal are the number of correct predictions for each class and anything off diagonal represents incorrect predictions.

So for label 0, we can see that every 0 sample was correctly predicted, but there were a total of 3 “false positives” for 0. We can see that samples of type 1 and 2 are quite likely to be falsely predicted as 0.

For label 2, we can see that those are likely to be “false negatives” and can get predicted as either 0 or 3.

And so forth for labels 1 and 3 …

So that gives you a quick visual way to figure out which of the labels are the most problematic for your model. Then you can do further analysis to figure out if the inputs are incorrectly labelled or if perhaps you need more data for those difficult classes. This sort of topic (what to do when your model doesn’t predict as well as you require) was addressed more in Course 3. A lot of people skip that one, because there are no programming assignments, but there are lots of interesting ideas taught there that may help in this sort of situation.

Do the activations need to be defined separately? I got the error:
NameError: name 'activations' is not defined

I didn’t mean that is necessarily the variable name to be used in your particular case. I was just writing some sample code to demonstrate what a confusion matrix looks like.

The “activations” are whatever the softmax output is for your model. How do you compute that?

I see. I’m trying to figure out a way to get the activations of the last layer.

Let’s see what we can learn by looking at the existing code in the notebook. At first they just plot the accuracy, which doesn’t show you how to get the actual activation output. Although note that their network is set up differently: It does not include the output layer activation (sigmoid in that case) and uses from_logits = True mode on the loss function. Ok, but not much help there. Sorry.

I created my own test block in the MobilNet exercise to process a single image. Here’s how I did it:

# Try running predict on a single image

image_var = tf.Variable(augmented_image)
print(f"image_var shape {tf.shape(image_var)}")

model2.trainable = False

pred_logit = model2(image_var)
pred = tf.math.sigmoid(pred_logit)

Of course the thing to note there is that I was using the original model as they defined it here, meaning that the outputs are logits as opposed to activations, so I had to manually apply sigmoid. You wouldn’t need to do that extra step, since you included softmax in the output layer of the model.

The other way to approach this is to use the “predict” method of the Keras Model class, but the documentation says to reserve that for dealing with large batches. Just directly invoking the model, as I showed in the previous example, is fine if the amount of data used as input is relatively small.

Note that the Emojify assignment (in Course 5 Week 2) includes a confusion matrix function and plot. Perhaps come back to this topic after you complete that assignment.

1 Like

Hey, friend. Have you solved your problem? How did you do that