L5-Evaluation apply_and_parse error and gpt-3.5-turbo-0301 deprecation

On trying to get this cell to work I kept running into an error that I couldn’t understand, it seemed like the apply_and_parse function wasn’t able to parse the result the LLM output and nothing worked to fix it.
I realized that at the beginning there’s a cell that makes sure the gpt-3.5-turbo-0301 model isn’t used after the deprecation date, which was yesterday. Unfortunately, gpt-3.5-turbo’s output for this cell creates an error, the same for gpt-3.5-turbo-0613.

The deprecation of 0301 is set for today at the earliest, so it might continue to work, but it’s a shaky foundation so this should be reviewed to make sure the notebook doesn’t break completely.

Until then, try using 0301 and pray it still works :melting_face:

5 Likes

Bump

Hi. When running the following cell

new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)

I got the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[18], line 1
----> 1 new_examples = example_gen_chain.apply_and_parse(
      2     [{"doc": t} for t in data[:5]]
      3 )

File /usr/local/lib/python3.9/site-packages/langchain/chains/llm.py:257, in LLMChain.apply_and_parse(self, input_list, callbacks)
    255 """Call apply and then parse the results."""
    256 result = self.apply(input_list, callbacks=callbacks)
--> 257 return self._parse_result(result)

File /usr/local/lib/python3.9/site-packages/langchain/chains/llm.py:263, in LLMChain._parse_result(self, result)
    259 def _parse_result(
    260     self, result: List[Dict[str, str]]
    261 ) -> Sequence[Union[str, List[str], Dict[str, str]]]:
    262     if self.prompt.output_parser is not None:
--> 263         return [
    264             self.prompt.output_parser.parse(res[self.output_key]) for res in result
    265         ]
    266     else:
    267         return result

File /usr/local/lib/python3.9/site-packages/langchain/chains/llm.py:264, in <listcomp>(.0)
    259 def _parse_result(
    260     self, result: List[Dict[str, str]]
    261 ) -> Sequence[Union[str, List[str], Dict[str, str]]]:
    262     if self.prompt.output_parser is not None:
    263         return [
--> 264             self.prompt.output_parser.parse(res[self.output_key]) for res in result
    265         ]
    266     else:
    267         return result

File /usr/local/lib/python3.9/site-packages/langchain/output_parsers/regex.py:28, in RegexParser.parse(self, text)
     26 else:
     27     if self.default_output_key is None:
---> 28         raise ValueError(f"Could not parse output: {text}")
     29     else:
     30         return {
     31             key: text if key == self.default_output_key else ""
     32             for key in self.output_keys
     33         }

ValueError: Could not parse output: QUESTION: According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?

ANSWER: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.

Is this the same error you got?

2 Likes

I tried using the version gpt-3.5-turbo-0301 for the QAGenerateChain, and it fixed the problem for this cell. So you can do the following:

example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI(model="gpt-3.5-turbo-0301"))
8 Likes

Thanks

Thanks, I can confirm that this is still an issue when using the model “gpt-3.5-turbo”. As of 10th September, 2024 “gpt-3.5-turbo-0301” still works but unfortunately, this is due to be shutdown on 13th Sept as per https://platform.openai.com/docs/deprecations

1 Like

Yes, I still got the same problem.

@Mubsi , deprecated tools?

Thanks @TMosh. I just checked, the loading of the model didn’t run into an error for me, but later in the notebook I got an error. I have reported it to the team.

Thank you for bringing this up! Indeed it’s something about parsing and new compatible versions with OpenAI and Langchain, more specifically with QAGenerateChain.
Just updated this code cell:

new_examples = example_gen_chain.apply_and_parse(
#[{“doc”: t} for t in data[:5]]
[{“doc”: data[:5]}]
)

Try it again :slight_smile:

5 Likes

cell 24 also have an issue:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 graded_outputs = eval_chain.evaluate(examples, predictions)

File /usr/local/lib/python3.9/site-packages/langchain/evaluation/qa/eval_chain.py:60, in QAEvalChain.evaluate(self, examples, predictions, question_key, answer_key, prediction_key)
     50 """Evaluate question answering examples and predictions."""
     51 inputs = [
     52     {
     53         "query": example[question_key],
   (...)
     57     for i, example in enumerate(examples)
     58 ]
---> 60 return self.apply(inputs)

File /usr/local/lib/python3.9/site-packages/langchain/chains/llm.py:157, in LLMChain.apply(self, input_list, callbacks)
    155 except (KeyboardInterrupt, Exception) as e:
    156     run_manager.on_chain_error(e)
--> 157     raise e
    158 outputs = self.create_outputs(response)
    159 run_manager.on_chain_end({"outputs": outputs})

File /usr/local/lib/python3.9/site-packages/langchain/chains/llm.py:154, in LLMChain.apply(self, input_list, callbacks)
    149 run_manager = callback_manager.on_chain_start(
    150     {"name": self.__class__.__name__},
    151     {"input_list": input_list},
    152 )
    153 try:
--> 154     response = self.generate(input_list, run_manager=run_manager)
    155 except (KeyboardInterrupt, Exception) as e:
    156     run_manager.on_chain_error(e)

File /usr/local/lib/python3.9/site-packages/langchain/chains/llm.py:79, in LLMChain.generate(self, input_list, run_manager)
     77 """Generate LLM result from inputs."""
     78 prompts, stop = self.prep_prompts(input_list, run_manager=run_manager)
---> 79 return self.llm.generate_prompt(
     80     prompts, stop, callbacks=run_manager.get_child() if run_manager else None
     81 )

File /usr/local/lib/python3.9/site-packages/langchain/chat_models/base.py:143, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks)
    136 def generate_prompt(
    137     self,
    138     prompts: List[PromptValue],
    139     stop: Optional[List[str]] = None,
    140     callbacks: Callbacks = None,
    141 ) -> LLMResult:
    142     prompt_messages = [p.to_messages() for p in prompts]
--> 143     return self.generate(prompt_messages, stop=stop, callbacks=callbacks)

File /usr/local/lib/python3.9/site-packages/langchain/chat_models/base.py:92, in BaseChatModel.generate(self, messages, stop, callbacks)
     90     run_manager.on_llm_error(e)
     91     raise e
---> 92 llm_output = self._combine_llm_outputs([res.llm_output for res in results])
     93 generations = [res.generations for res in results]
     94 output = LLMResult(generations=generations, llm_output=llm_output)

File /usr/local/lib/python3.9/site-packages/langchain/chat_models/openai.py:292, in ChatOpenAI._combine_llm_outputs(self, llm_outputs)
    290 for k, v in token_usage.items():
    291     if k in overall_token_usage:
--> 292         overall_token_usage[k] += v
    293     else:
    294         overall_token_usage[k] = v

TypeError: unsupported operand type(s) for +=: 'OpenAIObject' and 'OpenAIObject'
4 Likes

I am observing the same issue in the cell “eval_chain.evaluate”, was this resolved?
Tried a few things including the class definition to “langchain.evaluation.qa.eval_chain”, but that didn’t work.

1 Like

Thank you @lesly.zerna, your fix worked, but the notebook still breaks on line 24 graded_outputs = eval_chain.evaluate(examples, predictions), near the end. Do you have any idea how it can be solved? Thx

1 Like

I am having the same issues. Has anyone gotten past this error?

That worked for me:

example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI(model=“gpt-3.5-turbo”))
new_examples = example_gen_chain.apply_and_parse([{“doc”: data[:5]}])

2 Likes

Thank you , this worked

Did anyone resolve this issue? @lesly.zerna solution worked (thank you Lesly!) to generate predictions. But when it came to QAEvalChain, and applying that function in graded_outputs, I got the error again: …/opt/anaconda3/lib/python3.8/site-packages/langchain_core/language_models/chat_models.py in generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
424 for res in results
425 ]
→ 426 llm_output = self._combine_llm_outputs([res.llm_output for res in results])
427 generations = [res.generations for res in results]
428 output = LLMResult(generations=generations, llm_output=llm_output) # type: ignore[arg-type]

/opt/anaconda3/lib/python3.8/site-packages/langchain_community/chat_models/openai.py in _combine_llm_outputs(self, llm_outputs)
375 for k, v in token_usage.items():
376 if k in overall_token_usage:
→ 377 overall_token_usage[k] += v
378 else:
379 overall_token_usage[k] = v

TypeError: unsupported operand type(s) for +=: ‘dict’ and ‘dict’

1 Like

Hi @lesly.zerna, correct me if I’m wrong, but your solution only generates one new example using 5 documents in data, while the goal is to generate 5 new examples, one for each document in data[:5].