(Rerank video 5m11s is the reference) I get that training the LLM on application specific positive and negative examples is the standard way to improve rerank performance for a particular application. OK but How do I train my model to improve rerank performance, to better understand my particular application data, when it’s GPT4 which is too big and it’s proprietary so I can’t train it? GPT-4 is fine-tunable I think, so maybe that’s the best I can do, right?
For even lower cost, can we also use many-shot prompt engineering for improving rerank performance, doing a similar process to true training where model weights get modified but just use smaller datasets than true model weight training for our positive and negative examples?
Finally, can the language model itself help us find good examples of positives and negatives?