Response inconsistency

I’m running into two issues in l4-summarizing notebook:

  1. ChatGPT doesn’t always listen to the prompt instructions. One of the prompt details for the last summarization example is Summarize the review below, delimited by triple backticks in at most 20 words.. The summarizations of the first three reviews (the panda, the standing lamp, and the electric toothbrush) is always less than 20 words. However, the summarization for the blender review is not always less than 20 words, sometimes it is ~70 words long:

17 word summarization:

Mixed review of a blender system with price gouging and decreased quality, but helpful tips for use.

67 word summarization:

The product was on sale for $49 in November, but the price increased to $70-$89 in December. The base doesn’t look as good as previous editions, but the reviewer plans to be gentle with it. A special tip for making smoothies is to freeze the fruits and vegetables beforehand. The motor made a funny noise after a year, and the warranty had expired. Overall quality has decreased.

Does anyone have any insight into why ChatGPT sometimes does and sometimes doesn’t listen to the prompt instructions? I’ve run into this issue with response formatting where some responses are formatted according to the prompt instructions, e.g. JSON, and some responses are not. The inability to rely on consistent output formatting makes it very difficult to use ChatGPT in production because the inconsistency with regard to the prompt instructions sometimes breaks downstream logic, e.g. saving a response as a python dictionary when the response is not formatted as a JSON. This brings me to my second issue:

  1. Variability in the response even when temperature is set to 0. I thought that the responses should be deterministic when the temperature is set to 0, i.e. the same input gives the exact same output every time. However, as you can see from the example above, I’m getting two very different responses for the exact same prompt even though the temperature is set to 0. I ran the same cell over and over and ChatGPT randomly switched between the short answer and the long answer. Does anyone know why this is happening?

That is incorrect. There is no way to turn chatgpt into a deterministic system.

… the API is non-deterministic by default. This means that you might get a slightly different completion every time you call it, even if your prompt stays the same. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain.

That is from the text completions section of the Open AI doc. The chat section has a similar statement. Elsewhere it talks about best practices to make it less likely that the system will just make up (confabulate) an untrue response (but not how to completely prevent it from happening ever). In its current state, Chat GPT is non-deterministic, so you have to build that in to the design of any application being built on top of it.