RLHF: how many labeler results per prompt are input to reward model?

RLHF: Obtaining feedback from humans video at 5:22: "the ranking assigned by the human labellers [plural] was 2, 1, 3 …` Then there is a description of making all combinations of pairs of completions and coding their rewards for input to the reward model.

Was there a step skipped that described how the completion rankings of the labellers for the current prompt were combined into a consolidated score of 2,1,3 so that there is only one input per prompt to the reward model, or is each labeller’s result for each prompt an input? In the latter case it would seem that the plural in the quote should be singular.

EDIT to add: at 5:58, “Then you’ll reorder the prompts so that the preferred option comes first.” For this to make sense, “the prompts” should be “each completion pair” or similar, should it not? I.e., I infer that each completion pair should be reordered as necessary so the preferred response is the first of the pair, so that the coded rewards are all [1,0]. Right?

It is explained in the video each labeler ranks each of the prompts. Then all the scores for each of the prompts (coming from different labellers) are combined and the score that is a majority of all the scores for that prompt is chosen.

Correct.

It is explained in the video each labeler ranks each of the prompts.

The video says, “The same prompt completion sets are usually assigned to multiple human labelers to establish consensus and minimize the impact of poor labelers in the group.” So yes, it’s explained that each laborer ranks each of the prompts. (I assume “prompts” is shorthand for “completion sets”.) That was not in question.

Then all the scores for each of the prompts (coming from different labellers) are combined and the score that is a majority of all the scores for that prompt is chosen.

If “assigned to multiple … laborers to establish consensus” is intended to explain that the ranks for each completion set are combined by using the majority rank for each completion, all I can say is that’s too terse to prevent the puzzlement that induced me to post my question.

I guess maybe because she is not a Native English speaker there might be some issues in understanding!