Published my first blog and it’s about AI sycophancy. Exploring RLHF, Goodhart’s Law, and why the industry chooses the cheap signal over the correct one.
Did Silicon Valley build the Approval Machine, or did we?
Approval Machine https://medium.com/@alekya.kastury/approval-machine-f685e1e7b84a
Hey @Alekya_Kastury, welcome to the community!
I went through your article and it does a great job highlighting the key issues around RLHF, especially how optimizing for the “easy” signals can be easier for companies then what’s actually meaningful.
One thought I had while reading: a significant part of the problem seems rooted in the data itself. As your article perfectly states that majority of the data that the models are being trained on is either full of hate posts or approval seeking flattery, which doesn’t reflect how people naturally communicate in everyday life. This possible bias probably makes the sycophancy even worse. Addressing this at the data level would probably be more expensive and complex, but it feels like a more robust solution to me.
Really enjoyed reading this, well put together.
Thank you @Jasmeet_Singh2 - Yes addressing the issue at a data level where ever possible, and getting professionals (like doctors, lawyers etc.) to validate the responses of the model.