Generative AI with Large Language Models: Week 3

agarwalamit081 · November 19, 2024, 7:40pm

Generative AI with Large Language Models: Week 3
RLHF: Obtaining feedback from humans

We have prompt dataset, and several prompt samples that are passed to the instruct LLM to obtain completions.
Also we have several human labelers to provide feedback.

I do not quite understand the figure.

In the example we had one prompt and three likely prompt completions.
Prompt: “My house is too hot”
completion 1: “There is nothing you can do about hot houses”.
completion 2: “You can cool your house with air conditioning”.
completion 3: “It is not too hot”.

Does it mean that the same input “My house is too hot” was provided to three different prompts to the same instruct LLM to return the 3 different completions?

Would these two scenarios be different?

same input on the prompt is passed to the same instruct LLM thrice, three different completions are obtained
three different prompts (same inputs) passed to the same instruct LLM once each, three different completions are obtained

What I understand is that there are k different prompts each with the same input and so, we have k different outputs,
and we have multiple human labelers, say, L. So each input gets k*L number of ratings which is then aggregated (averaged) to rank each of the prompts.
Is this correct?

gent.spah · November 20, 2024, 6:09am

As far as I remember here these are the completions that the LLM is trained on right?

Yes depends on the temperature setting of the LLM (or how halucinating you want it to be) it is possible if you change that setting or even if you dont because the LLMs are probabilistic in nature, a probability result will be choosen and might not be the same all the times.

Yes, some kind of majority voting should take place when choosing the correct label!

Topic		Replies	Views
RLHF: how many labeler results per prompt are input to reward model? Generative AI with Large Language Models	3	20	February 13, 2025
How to get the Base LLM to generate multiple completions Reinforcement Learning from Human Feedback	1	241	August 27, 2024
Week 3: Video RLHF Reward Model Generative AI with Large Language Models week-module-3	0	318	November 18, 2023
RL for fine tuning of LLM AI Discussions	0	57	September 24, 2023
Lab_2_fine_tune_generative_ai_model Generative AI with Large Language Models week-module-2	1	284	January 13, 2024

Generative AI with Large Language Models: Week 3

Related topics