The magic reward model?

Choy-Hsien_Lin · July 7, 2023, 9:31am

The reward model seems to be a bit of magic to me. One of the criteria for a good model is honesty. How could comparatively simple reward model be able to a better judge of what is honest than the much larger LLM?

Even with the tens of thousands of human input to create the reward model, I can’t see how it would have enough information know it. And how could it generalise those assessments to other prompts given that the completions mostly seems to make sense.

Juan_Olano · July 7, 2023, 1:50pm

One possible answer to your question:

Yes, it could be better to have humans do the rewards, instead of a “simple reward model”. However, the cost and logistics to have the number of humans needed to be able to do this on a very LLM is not practical.

For this reason, the Rewards Model is a very good proxy to solve this problem. The Rewards Model provides a very good response and it can be created, re-used, and it can scale, at reasonable costs.

Like someone said a long time ago “Perfect is enemy of good”. In this case, perfect may be not viable, so we have a very good solution.

Thoughts?

Kasowari · July 7, 2023, 3:04pm

It is mentioned in the course that smaller models can be suitable for “narrow” tasks. In this case we only need a single result from the reward model, as opposed to the ability to do possibly multiple complex tasks well (as for the LLM). As less complex models need less data to train in general, using simpler models for the reward could be a practical trade-off.

Choy-Hsien_Lin · July 8, 2023, 4:02pm

I definitly see the the point of transfering the bulk work to a model instead of manually setting the rewards. I just have a hard time accepting that the LLM is based on milliards of real entries and the reward model is based on much fewer still is able to just how well it’s performing. Couldn’t it easily be fooled? To some part this is the reward hacking that is mentioned, but it could be more subtle.

rmwkwok · July 8, 2023, 6:46pm

I have heard that a reward model could be based on a pretrained LLM, would that be more acceptable?

Choy-Hsien_Lin · July 10, 2023, 6:43am

Not sure if sets my mind at ease. It’s kind of the blind guiding the blind. Especially since I assume the model being trained is supposed to be better than anything else available. This would not be the case when reducing the model complexity for inference, but I guess alignment is done on the full model rather than a reduced model.

rmwkwok · July 11, 2023, 1:21am

Why is it the blind guiding the blind? More specifically, why is the reward model blind? The reward model is LLM based and is trained on the human feedback data.

Did you come across any reward model that was evaluated to be performing very poorly on the human feedback data, and still someone used it? If so, how poor was that evaluation result? If not, how do you justify your worry?

Raymond

Arun_Prakash_A · July 11, 2023, 9:18am

First thing to note is that the LLMs objective is to model P(x_t|x_{t-1},\cdots,x_0) where the probability for the next token is over the entire vocabulary . This is much more complex than the reward model where it models P(y|(x_0,\cdots,x_T)) where y is binary. The former is much, much harder to solve than the latter. Therefore, requires millions of examples to train.

Any deep learning model can be fooled, no matter how complex it is. There is a separate branch of study on it.

As @Juan_Olano pointed out, all these are just (economically) viable solutions (or techniques) that are far from perfect. Deep learning is a game of gradients, still a black box.

Topic		Replies	Views
Why use RL instead of supervised learning? Generative AI with Large Language Models week-module-3	10	783	September 22, 2023
Why build a rewards model in RLHF? Generative AI with Large Language Models week-module-3	2	388	July 18, 2025
Week3 lab, the part given to the reward model using human feedback Generative AI with Large Language Models week-module-3 , faq	18	309	June 4, 2024
Question about reward model in RLHF Generative AI with Large Language Models week-module-3	7	517	January 7, 2024
Week 3: Video RLHF Reward Model Generative AI with Large Language Models week-module-3	0	319	November 18, 2023

The magic reward model?

Related topics