There is enough content suggesting the use of reinforcement learning to align LLM for human feedback. Wondering if there are other uses of RL in case of LLM.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| ✨Enroll in Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-Training | 5 | 211 | November 6, 2025 | |
| L3_tune_llm | 0 | 27 | September 25, 2024 | |
| Week3 lab, the part given to the reward model using human feedback | 18 | 323 | June 4, 2024 | |
| Why does not anyone apply GRPO fine tuning on a GRPO fine tuned model | 2 | 97 | May 22, 2025 | |
| Reinforcement learning LLMs | 1 | 127 | July 6, 2024 |