RLHF... How?

Andrew talks about ‘marking’ LLM output when fine tuning. How might these ‘numbers’ get back into the model to improve it?
Not mentioned?

1 Like

In the batch, there was a nice article from DL.AI on RLHF:

RLHF basics: A popular approach to tuning large language models, RLHF follows four steps:
(1) Pretrain a generative model.
(2) Use the model to generate data and have humans assign a score to each output.
(3) Given the scored data, train a model — called the reward model — to mimic the way humans assigned scores. Higher scores are tantamount to higher rewards.
(4) Use scores produced by the reward model to fine-tune the generative model, via reinforcement learning, to produce high-scoring outputs.

In short, a generative model produces an example, a reward model scores it, and the generative model learns based on that score.

source

If you want to go deeper and gather some hands-on experience, you might wanna check out the GenAI course: Generative AI with LLMs - DeepLearning.AI

Best regards
Christian

1 Like

Thanks Christian. That makes sense, building another model, hopefully atop the main one.
Andrew did not mention that.

1 Like