I am using generative models like chat GPT, T5, Flan, Llama, etc for classification. I have three classes. So it is a 3-class classification problem. The class labels are: Not vivid, moderately vivid, highly vivid. The model predicts the class labels. But I need to get the probability of each class similar to BERT model i.e, If I fine-tuned a BERT model, It is easy to get the probability of each class. We need to add a SoftMax layer to the last year which returns the logits for each class. But the performance of BERT is not good for my scenario and using a generative model like T5 or Flan has a good performance. But I don’t know how to get the probabilities for each class using these generative models which output the probability distribution over the vocab not over the classes.
What if you tell chatgpt that I am considering 3 classes and I want the probability over them, in some a prompting manner!
Thank you for the reply. That is an interesting idea but how the model is supposed to predict a probability. In this case, shouldn’t we give the model a formula or something?
Try doing something similar to what I said, I haven’t tried it either to be honest! Plus the model wont calculate probabilities it will just tell you for those classes what probabilities has found from its inner working, if it can be done!
I’m assuming it is a text classification task. A solution might be to extract the embeddings from your inputs using one of these language models. For example, you might use only the encoder of T5 to extract the embeddings. Once you have these embeddings, you can add a set of fully connected layers using an activation function for the hidden layers and a softmax function for the output layer.
The softmax function here is crucial since it will turn the logits into class probabilities.
Keep in mind that you should also choose the loss function (e.g., cross-entropy), as well as other hyperparameters such as the learning rate, batch size, and the number of neurons in each layer. You can split your dataset into training and validation sets and monitor the model’s performance on the validation set.
If the performances are not satisfying after training, you might need to check your data for issues, add regularization techniques (e.g., dropout, L2 regularization, …), modify the architecture of your neural network, consider different activation functions, and track numerical under/overflows.
Thank you so much for the reply. Yes, it is a classification task, and I am promoting the model to predict the class label. Do you suggest training the last year because you are mentioning using the loss function? Training is not possible because I do not have a lot of training data and I am using few-shot promoting?
Yes, I’m suggesting a transfer learning approach where you freeze the update of the language model weights and update only the added (last) layers to obtain class probabilities for text classification. By doing this, you leverage the pre-trained knowledge encoded in the language models, which have already been trained on extensive corpora. This transfer learning strategy is particularly effective in you case, when you have small datasets.
Moreover, since you mentionned prompting the model. An easier, but not so efficient solution in this case would be In-context Learning, in which you prompt your model with examples of texts and their corresponding classes. The language model then “learns” to classify new text based on examples given in prompt. In your case, the prompt would be something like this:
### Instructions
The text classes can be one of the following three categories: Not vivid, Moderatly vivid, highly vivid.
Below are some example classifications. the format is the text followed by classification tag.
- text 1 goes here.
<cls>Not vivid</cls>*
- text 2 goes here.
<cls>vivid </cls>
…
Classifiy the new texts mentionned below. Additionally, provide probabilities for each class. - new text to classify goes here
…
*CLS is short for classification
Thank you for the reply. I am actually doing few-shot in-conext leaning and I am using Flan model which an instruction-tuned version of T5. I will prompt the model to predict the probability of each class beside predicting the class label and check what happens
I have noticed that the model is unable to output a probability. Because the few examples have the text and its corresponding labels. The model also outputs the label which I think it makes sens because it is leaning from the few-examples and they are class labels not probabilities
While prompting LLMs can sometimes yield probabilities, these may not be reliable as they’re not specifically trained for this task and may be just LLM guesses. For guaranteed probabilities in your workflow, consider tools like Lamini SDK’s classifier. It provides probabilities through a separate learning process or in-context learning. They provide code snippets that meet your needs.
FYI
Thank you, your response is very helpful. I think thats the only option that I have however using APIs are not free, the is the main problem with me
Thank you, your response is very helpful. I think thats the only option that I have however using APIs are not free, the is the main problem with me
I encourage you to check out free, open-source LLMs (Lamini provides a free option) for testing your workflow. You may not need powerful LLMs with more than 20b parameters for your task. Even “basic” LLMs can provide valuable insights and save you money initially. Definitely give them a try first!
For all the open source LLMs the problem is the same. it seem just some APIs provide some services (functions) that we can leverage them. I hope I can figure out how to use the free version if Lamini API to use their library