There is enough content suggesting the use of reinforcement learning to align LLM for human feedback. Wondering if there are other uses of RL in case of LLM.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Week3 lab, the part given to the reward model using human feedback | 18 | 318 | June 4, 2024 | |
| Why use RL instead of supervised learning? | 10 | 821 | September 22, 2023 | |
| Fine Tuning & RLHF - Good Example | 2 | 534 | July 18, 2023 | |
| L3_tune_llm | 0 | 26 | September 25, 2024 | |
| Quiz - week3 - RLHF reward hacking - end of video quiz - interpretability | 7 | 371 | January 18, 2024 |