Lecture on Evaluation: Concerning the Requirements of Example Data

Koichi_Saito · June 3, 2023, 4:41pm

In the lecture on how to evaluate the LLM application, I understood that the output of the LLM is evaluated through the following process:

Generate QA data from a part of a document using QAGenerateChain
Manually prepare the correct QA data
Combine 1 and 2 to prepare the example QA data (ground truth data)
Generate answers for example data questions using RetrievalQA + vector store (referred to as predictions)
Compare the example A and the prediction’s A with QAEvalChain, if they are identical in meaning, they are considered correct

My question is, is it appropriate to treat the output of the QAGenerateChain in step 1 as the valid example data? I assume this would be the case if we could trust the results of the QAGenerateChain 100%, but I think there may also be cases where this is not the case.

Would anyone be able to provide some advice on this question?

Topic		Replies	Views
Lesson 5 update request \| QAGenerateChain LangChain for LLM Application Development	2	435	March 18, 2025
What is the point of evaluation between QA Retrieval Chain and QA Generation Chain? LangChain for LLM Application Development	1	20	December 6, 2024
ConversationalRetrivealChain LangChain: Chat with Your Data	3	466	August 21, 2023
LangChain: Evaluation (L5), Where is the LangChain Evaluation Platform LangChain for LLM Application Development langchain	1	168	July 9, 2023
L5 Evaluation - Where does the "Real Answer" come from? LangChain for LLM Application Development	1	108	July 16, 2023

Lecture on Evaluation: Concerning the Requirements of Example Data

Related topics