The magic reward model?

Juan_Olano · July 7, 2023, 1:50pm

One possible answer to your question:

Yes, it could be better to have humans do the rewards, instead of a “simple reward model”. However, the cost and logistics to have the number of humans needed to be able to do this on a very LLM is not practical.

For this reason, the Rewards Model is a very good proxy to solve this problem. The Rewards Model provides a very good response and it can be created, re-used, and it can scale, at reasonable costs.

Like someone said a long time ago “Perfect is enemy of good”. In this case, perfect may be not viable, so we have a very good solution.

Thoughts?

Topic		Replies	Views
Why use RL instead of supervised learning? Generative AI with Large Language Models week-module-3	10	740	September 22, 2023
Why build a rewards model in RLHF? Generative AI with Large Language Models week-module-3	1	365	October 26, 2023
Week3 lab, the part given to the reward model using human feedback Generative AI with Large Language Models week-module-3 , faq	18	266	June 4, 2024
Clarification on Optional video: Proximal policy optimization Generative AI with Large Language Models week-module-3	0	439	July 4, 2023
Question about reward model in RLHF Generative AI with Large Language Models week-module-3	7	464	January 7, 2024

The magic reward model?

Related topics