The model predicts the class labels (for example for sentiment analysis it is Positive, negative, neutral. But I need to get the probability of each class similar to BERT model i.e, If I fine-tuned a BERT model, It is easy to get the probability of each class. We need to add a SoftMax layer to the last year which returns the logits for each class. But the performance of BERT is not good for my scenario and using a generative model has a good performance. But I don’t know how to get the probabilities for each class using these generative models which output the probability distribution over the vocab not over the classes. basically for classification, how to provide a probability associated with each class prediction, enabling users to set their own classification or confidence thresholds. Any Idea?