Clarification on lecture video RLHF: Obtaining feedback from humans

I didn’t get at 4:24 why label the repose with the worst quality as ‘F’. How does it help when doing the pair-wise training data? You cannot simply discard it because it IS the output from the model and the model needs to know this is a VERY BAD output.

1 Like

Labeling the response with the worst quality as ‘F’ is a common practice in pair-wise training data because it helps to distinguish the bad outputs from the good ones. By labeling the worst responses as ‘F’, the model can learn to differentiate between good and bad responses during training. Simply discarding bad outputs is not ideal because it does not provide the model with feedback on what went wrong and how it can improve. Therefore, labeling the worst responses as ‘F’ can be an effective way to provide the model with feedback and help it learn to generate better responses.

My question is why not label it as some scalar value? For example, if we define 5 tiers of quality for the model outputs, then naturally the output with the worst quality should be labeled ‘5’ (assuming ‘1’ is the best and ‘5’ is the worst). I just don’t see why a non-scalar ‘F’ is preferred.

It seems like you are questioning the use of a non-scalar ‘F’ to label quality of model outputs. The reason for using ‘F’ instead of a scalar value is likely due to the fact that ‘F’ is a common metric used to evaluate the performance of machine learning models. It takes into account both precision and recall, which are important factors in determining the overall quality of a model’s output. Additionally, using a scalar value may oversimplify the evaluation process, as it may not capture the nuances of a model’s performance.

This is not correct, your answer refers to the F1 score instead of the F label assigned to score bad annotations.

Hi there was a typo. What I meant was as long as the ordering is preserved* there shouldn’t be an issue.

I apologize for the confusion in my previous response. Thank you for pointing out the error.
In general, labels used to indicate the quality or correctness of annotations can vary depending on the specific application or domain. It’s possible that ‘F’ is being used as an abbreviation for ‘False’ or ‘Failed’ to indicate annotations that are deemed to be incorrect or of poor quality.