Guidance Needed: Improving Guardrail Evaluation in RAG System (GPT-4o-mini Use Case)

ulyaaliyeva206 · September 11, 2025, 7:53am

(i need some advice)I’m working on a RAG pipeline using GPT-4o-mini, and I’ve implemented a prompt-based guardrail system to verify whether generated answers are accurate, relevant, and fully grounded in the provided context.

The core idea is:

For each (user_question, context, generated_answer) triplet, the guardrail checks:
1. Relevance to the question
2. Completeness of answer
3. Factual accuracy sentence-by-sentence
4. No hallucinations or invented facts

The prompt outputs a binary pass/fail (1 or 0). While this works in many cases, I’m observing significant false positives (hallucinated or incomplete answers passing), and false negatives (over-rejecting answers that are technically grounded but phrased differently).

My Question:

Would fine-tuning a model like BERT (or RoBERTa) to act as a binary verifier (given context, question, and answer) be a more reliable long-term solution? Or would you recommend a different approach — e.g., NLI-based sentence-level verification or chain-of-thought prompting to improve consistency?

If fine-tuning is viable:

What’s a minimum dataset size you’d consider effective?
Would a few thousand manually reviewed samples be enough to get decent performance?

Topic		Replies	Views
How to set the context in RAG evaluation? Building and Evaluating Advanced RAG Applications	4	204	December 5, 2023
Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework News and Announcements dl-ai-learning-platform	3	363	October 26, 2023
Is it possible to run the code with gpt4-turbo? Building and Evaluating Advanced RAG Applications	0	113	December 6, 2023
Chatbot giving answer out of context I provided from pdf AI Discussions	3	142	December 26, 2023
Agents Evaluation Building Generative AI applications with Gradio	0	182	June 5, 2024

Guidance Needed: Improving Guardrail Evaluation in RAG System (GPT-4o-mini Use Case)

My Question:

Related topics