Whilst performing the F1 score for binary classification, he had mentioned that F1 can be considered to pick the best model. How about a multiclass classification problem, when we have two models?
For multiclass, you will calculating for each class in simlar way as binary classification.
Instead of typing in the explanation, let me provide you a URL where it is done nicely
So, the individual F1 scores will tell us how the model performs for each class. I assume we would check this when we are okay with some class not performing that well due to frequency of that particular class. But, otherwise would prefer using the single F-1 score for the model ( where every class has equal importance ). Isn’t it ? What would be some other cases where single F1 scores will be used in case of multi-class classification?