RL for fine tuning of LLM

There is enough content suggesting the use of reinforcement learning to align LLM for human feedback. Wondering if there are other uses of RL in case of LLM.