RAG question

Gotham · December 20, 2023, 11:53pm

How should one interpret the TruLens eval that shows a high answer relevance, with high groundedness, but with a low context relevance?

David_Hillmann · December 23, 2023, 9:02pm

@Gotham
I would guess that only a small amount of the retrieved context is actually relevant. This could mean that the LLM uses this relevant context to come up with an answer, leading to high groundedness and answer relevance. If, for example, only 1 out of 5 context chunks is relevant then the context relevance is low based on how this metric is calculated, which could be useful as an insight because you could try to make the retrieval more efficient.

Gotham · December 28, 2023, 9:03pm

@David_Hillmann - thanks for your response. Indeed, I was wondering how to understand the context relevance and groundedness scores.
If the answer is relevant and also well grounded in the context, as shown by the high answer relevance and groundedness scores, then either the context should be relevant (in which case it should have a high context relevance score) or only a small fraction (a small number of chunks) of the context should be relevant to the query (which could explain the low context score). With that idea, then should we focus our attention on improving the context retrieval or not? What would be gained by improving the context relevance score is to produce a more relevant answer (but the answer is highly relevant to the query already) using a smaller number of tokens? Is there a way we could remove the chunks with low context relevance scores before the synthesis step? Is this filtering out already by the llama index libs? Just wondering out aloud here …

Topic		Replies	Views
Building and Evaluating Advanced RAG Applications - How is e.g. Groundedness actually calculated? Building and Evaluating Advanced RAG Applications	4	687	December 3, 2023
How to set the context in RAG evaluation? Building and Evaluating Advanced RAG Applications	4	201	December 5, 2023
Why is context relavance so low? Building and Evaluating Advanced RAG Applications	0	12	August 19, 2024
Groundedness with Langchain Building and Evaluating Advanced RAG Applications	1	311	April 12, 2024
RAG evaluation metrics score threshold and when to use each metric AI Discussions ai-discussions	7	133	January 6, 2025

RAG question

Related topics