Details about ValueHead used for RLHF

The lab and lecture mention that a Value Head is used for RLHF.
What is the exact nature and use of the value head and how is it used to backpropagate the loss to the PEFT parameters that are being tuned ? Which paper can I read to know more about this process ?