Hallucinations

At 23:08 in the lesson on “Tagging and Extraction,” Harrison explained that the reason we see “Paper A, Paper B, Paper C” is that the document itself contains relevant information. I found that this is not the case. If we run the following code with an empty message, we still see the output containing “Paper A, Paper B, Paper C”. I suspect that this hallucination issue is related to the model gpt-3.5-turbo-0613.

Run this code:

messages = [
    {
        "role": "system",
        "content": """A article will be passed to you. Extract from it all papers that are mentioned by this article. Do not extract the name of the article itself. If no papers are mentioned that's fine - you don't need to extract any! Just return an empty list. Do not make up or guess ANY extra information. Only extract what exactly is in the text."""
    },
    {
        "role": "user",
        "content": ""
    }    
]

functions = [{'name': 'Info',
  'description': 'Information to extract',
  'parameters': {'title': 'Info',
   'description': 'Information to extract',
   'type': 'object',
   'properties': {'papers': {'title': 'Papers',
     'type': 'array',
     'items': {'title': 'Paper',
      'description': 'Information about papers mentioned.',
      'type': 'object',
      'properties': {'title': {'title': 'Title', 'type': 'string'},
       'author': {'title': 'Author', 'type': 'string'}},
      'required': ['title']}}},
   'required': ['papers']}}]

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    functions=functions,
    function_call={"name":"Info"}
)

response:

<OpenAIObject chat.completion id=chatcmpl-8RqRBhdDyXjgHYhq5NSReAGc99kzi at 0x7f585b460950> JSON: {
  "id": "chatcmpl-8RqRBhdDyXjgHYhq5NSReAGc99kzi",
  "object": "chat.completion",
  "created": 1701647117,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "Info",
          "arguments": "{\n  \"papers\": [\n    {\n      \"title\": \"Paper A\",\n      \"author\": \"Author A\"\n    },\n    {\n      \"title\": \"Paper B\",\n      \"author\": \"Author B\"\n    },\n    {\n      \"title\": \"Paper C\",\n      \"author\": \"Author C\"\n    }\n  ]\n}"
        }
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 130,
    "completion_tokens": 69,
    "total_tokens": 199
  },
  "system_fingerprint": null
}
1 Like

Anther evidence:

1 Like


There seem to be other forms of hallucinations. splits[9] refers to game development on Mario, but it doesn’t relate to a paper “Python Game Development: Creating a Snake Game” authored by “Real Python”.

1 Like