ABCDE scoring of factual content

forus · August 28, 2023, 12:01pm

Hello everyone,

In “Evaluation Part II” of “Building Systems with the ChatGPT API” short course,
we used factual content scoring by OpenAI evals project.

Here is description for each of the score:

(A) The submitted answer is a subset of the expert answer and is fully consistent with it.
(B) The submitted answer is a superset of the expert answer and is fully consistent with it.
(C) The submitted answer contains all the same details as the expert answer.
(D) There is a disagreement between the submitted answer and the expert answer.
(E) The answers differ, but these differences don't matter from the perspective of factuality.

It sounds to me like the C score is the best and D is the worst you can get. Is that correct?
Do you know why the checks were ordered that way suggesting that A is the best score and E is the worst?

Thank you for you answer in advance!

SamReiswig · August 28, 2023, 7:42pm

Hi @forus !

I don’t think it really matters what the order is, just that there is a single letter that the model can output and what it means.
You can re-order them if you like and see if it works.

Hope this helps!

Sam

Topic		Replies	Views
Assistance Request for Week Two Assignment in "Natural Language Processing with Probabilistic Models" Course NLP with Probabilistic Models week-2	5	27	October 22, 2024
Moderation - category scores - their robustness? Building Systems with the ChatGPT API	0	80	June 14, 2023
Confused about the Quiz points Structuring Machine Learning Projects	5	636	June 2, 2022
L9 Evaluation II Inconsistency in result - Getting 'D' where I should get 'A' Building Systems with the ChatGPT API	5	203	February 6, 2024
C3W1 Grading issues AI For Medical Treatment week-1	1	571	June 28, 2022

ABCDE scoring of factual content

Related topics