L9 Evaluation II Inconsistency in result - Getting 'D' where I should get 'A'

Just running the notebook without changing anything. When comparing ideal answer with the model generated answer, the score that I get from the model is ‘D’ however, according to the lecture and based on the answer, I would expect an ‘A’. Did something change?

Same here. I’ve asked to explain the reason and this is the answer:

Blockquote D) There is a disagreement between the submitted answer and the expert answer. \n\nExplanation: The submitted answer provides some information about the SmartX ProPhone and the FotoSnap DSLR Camera, but it does not include all the details mentioned in the expert answer. The expert answer provides specific features, such as the 12MP dual camera for the SmartX ProPhone and the 24.2MP sensor for the FotoSnap DSLR Camera, which are not mentioned in the submitted answer. Additionally, the expert answer provides information about the price and warranty for both products, which is missing in the submitted answer. Therefore, there is a disagreement between the two answers.

That is totally untrue of course…

I did the same and I got similar explanations.

‘The selected choice is (D) There is a disagreement between the submitted answer and the expert answer. \n\nThe submitted answer provides some information about the SmartX ProPhone and the FotoSnap DSLR Camera, but it does not include all the details mentioned in the expert answer. The expert answer provides specific features such as 5G wireless, 128GB storage, and a 12MP dual camera for the SmartX ProPhone, while the submitted answer only mentions the 6.1-inch display, 128GB storage, and a 12MP dual camera. Similarly, for the FotoSnap DSLR Camera, the expert answer mentions features like 1080p video, a 3-inch LCD, and interchangeable lenses, which are not mentioned in the submitted answer. Therefore, there is a disagreement between the submitted answer and the expert answer in terms of the details provided about the products.’

Same here. I tried to swap ‘D’ with ‘E’ in prompt and then I am getting ‘C’ now

"""
Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
    The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
    (A) The submitted answer is a subset of the expert answer and is fully consistent with it.
    (B) The submitted answer is a superset of the expert answer and is fully consistent with it.
    (C) The submitted answer contains all the same details as the expert answer.
    (E) There is a disagreement between the submitted answer and the expert answer.
    (D) The answers differ, but these differences don't matter from the perspective of factuality.
  choice_strings: ABCDE
"""

Just responding to say that I’m getting the same thing here. Of course like responders before me, I also asked ChatGPT for an explanation and got the same answer.

I’ve learned here not to fully rely on LLM to do these reasoning checks… at least not ChatGPT 3.5… maybe better in 4?

I didn’t get a better response using gpt-4 model - it’s not the latest and uses 0613 but it still should be better than 3.5. I have noticed the quality of ChatGPT decreasing over the last year although this is just a gut feeling from using it. Perhaps that’s a factor since the course was originally released months ago.