OpenAI rolls back sycophantic GPT-4o update

What are your thoughts on this? Reply below :down_arrow:

OpenAI reverted an April 25th update to GPT-4o that made the model excessively agreeable, particularly when validating users’ negative emotions. The update combined several changes that weakened the model’s primary reward signal, including a new signal based on user feedback that likely amplified this behavior. Despite positive results in offline evaluations and limited A/B testing, the company failed to adequately weigh qualitative concerns from expert testers who noticed the model’s behavior “felt slightly off.” OpenAI says it has implemented several new safeguards, including treating model behavior issues as launch-blocking concerns, introducing an “alpha” testing phase, and committing to more proactive communication about model updates. (OpenAI)

Subscribe for free access to :arrow_forward: Data Points!

I think it highlights how tricky it is to balance helpfulness with authenticity in AI. Being overly agreeable might seem harmless at first, but it can lead to trust issues or reinforce negative thinking in subtle ways. Good to see OpenAI taking a more cautious approach going forward, feedback from expert testers should carry more weight.

2 Likes

I think it’s the right call. And the new safeguards should help. Hopefully, we can avoid a repeat going forward.

1 Like