L7 Evaluation: Utils.py - A bit of prompt refinement needed II

Hi im executing the code in the video ( there are two lines that i have to guess because in the video dr. Isa didnt scroll enough to the right). I tried three different times and i face:

1 - First execution:

Step2 response

[
    {'category': 'Televisions and Home Theater Systems'}
]

2 - Second Execution:

Step2 response

[
    {'category': 'Smartphones and Accessories'},
    {'category': 'Cameras and Camcorders', 'products': ['FotoSnap DSLR Camera']}
]

3 - Third execution:

Step2 response

[
    {'category': 'Smartphones and Accessories'},
    {'category': 'Cameras and Camcorders'},
    {'category': 'Televisions and Home Theater Systems'}
]

So is surprising that in such controlled environment, with a tiny list of products and categories ( no subcategories or anything complicated), the results can vary so…specially when using temperature parameter as zero:

def get_completion_from_messages(messages, model="gpt-3.5-turbo", temperature=0, max_tokens=500):

The expected result should be ( as my understanding):

[
   {'products': 'SmartX ProPhone'},
   {'products': 'FotoSnap DSLR Camera'},
   {'category': 'Televisions and Home Theater Systems'}
]