Evaluation data set size for Fine-tuning and RAG

If I fine-tune a model or use RAG, I would need to evaluate the model on the specific prompts about the fine tuning or the retrieval through rag specific information. To use a public dataset to evaluate would not have this specificity, therefore I would need to put together an evaluation dataset.

How big should this dataset be?

There’s no numerical answer.

It needs to be enough data that you get good enough results for your needs.

1 Like

Thanks for the response @TMosh ! If I’m implementing RAG for a client, would you recommend to ask that I ask such client a set of sample questions with their answers? Or should I let the user try the solution and add some scoring system to track the ones that were not correctly answered?

Sorry, that’s not my area of expertise. Hopefully another member of the community will provide some guidance.

selection of evaluation dataset would be dependent on what kind of RAG application you are building. This in turn depends on your primary stockholders demands do’s and don’ts, features and investment.

more dataset is not always best for time and money related but can have better outcome. So balancing between all these criteria can help you decide how small dataset is enough to cover all the prompt features that doesn’t take too much time for your RAG application to use