Evaluation Part II

boredmgr · June 5, 2024, 3:32am

As part of evaluation part II, Andrew talks about 2 design patterns. The rubric pattern and ideal answer pattern. The rubric pattern is understandable where we ask the model to evaluate completion based on fixed set of questions.
But in ideal answer pattern , we evaluate the completion based on ideal answer. The example used in ideal answer mentions about specific products, categories and details about them. The user prompt used also asks about those specific products. Does this mean, in order to follow the ideal answer pattern , we will have create ideal answers for each of our development set or test set user queries. And suppose we are able to tune the prompt to based on evaluation to work on test set, but then how do we evaluate the response in production scenario where we may not have ideal answers for every customer service queries that the model has to respond to.

Topic		Replies	Views
Eval_with_rubric function [BEGIN ...] and [END ...] structure Building Systems with the ChatGPT API	1	115	February 6, 2024
Can anybody write an example of how to use a rubric to avaluate an output? Building Systems with the ChatGPT API week-2	0	137	February 25, 2024
Pattern advice from an LLM AI-Powered Software and System Design week-3	1	41	October 31, 2024
I created a study guide / test for the course ChatGPT Prompt Engineering for Developers	4	268	May 3, 2023
Local explainability as emerging functionality? ChatGPT Prompt Engineering for Developers	1	65	April 30, 2023

Evaluation Part II

Related topics