Accuracy of Logistic regression for a multiclass problem

Hello Everyone,

I am trying to calculate accuracy by counting predictions matched with the given class to predict the accuracy. I am using an in-house code in Python. However the problem is multi-class problem. I was thinking to calculate the accuracy for each class and then average all accuracy calculated. I would like to ask if it can be a correct approach.
Thanks in advance.

The question is how you would calculate the accuracy for each class using Logistic Regression, since it is (at least in its pure form) a binary classifier (yes/no). There are at least a couple of approaches I can think of:

“One vs All” is one way. Suppose you have 8 classes that you are trying to recognize. You would train 8 different Logistic Regression models: one to recognize class 0 vs everything else (not class 0). One to recognize class 1 versus not class 1. And so forth. Once you’ve trained all 8 models on your data, then you make prediction by running all 8 models with a given input and then interpret the results by picking the class for which that model produces the highest “yes” value. Of course the problem with that approach is that it is very costly: you have 8 times the training cost, since you’re literally running the full training 8 times on your training dataset. And then the prediction is relatively costly as well, since you’re running that 8 times also (of course prediction cost is trivial compared to training cost).

The other approach would not be “pure” Logistic Regression, but you could define your function to take the same number of inputs and just produce 8 outputs. So instead of w being a vector, it would be an 8 x n_x matrix. Then when you compute:

Z = W \cdot X + b

You’ll get 8 outputs from each input sample. Then you would use softmax with 8 classes as the activation function, instead of sigmoid. If you haven’t learned about softmax yet, it is specifically designed to be the multiclass generalization of sigmoid. For any given input, it produces 8 outputs that you can treat as a probability distribution giving the probability of each class being the correct prediction for that input. You could think of that solution as a trivial neural network with just 1 output layer. You would train it using the cross entropy loss function for multiple classes, which is essentially the same loss function we used for binary classifiers but with multiple classes. If you’re going to that much trouble, you could also consider adding more layers and creating a real neural network.

I have not taken MLS, so I’m not sure where (or if) they cover softmax. But it’s covered in DLS Course 2.

Update: Just to be clear, softmax is general and can handle any number of output classes. I just gave the example of an 8 class case in the description I wrote above. It works with whatever number of classes your data defines.

2 Likes

Very helpful. Thank you!

1 Like

The classic “handwritten digits” multi-classification example is in MLS Course 2 Week 2, as a graded practice lab.