KL divergence or trust region?

gent.spah · July 15, 2024, 9:52am

KL divergence compares old LLM model to new fine tuned LLM model!

Trust region is movement allowed for PPO model!

The ultimate goal is the same, what happening are different things!

Why do we need it? Well it seems the experts who have tested it they needed it! Is it complex? It is, but the whole RLHF, PPO fine tuned LLM system is a complex system with lots of tunings happening!

Topic		Replies	Views
Trust region in the PPO equation and KL divergence GenAI with LLMs Resources	2	437	October 19, 2023
Why use KL divergence in PPO? Generative AI with Large Language Models week-module-3 , faq	1	176	July 16, 2024
I have a question about the content of the lecture Generative AI with Large Language Models week-module-3	0	407	August 14, 2023
Reinforcement learning LLMs Generative AI with Large Language Models week-module-3	1	97	July 6, 2024
Why does not anyone apply GRPO fine tuning on a GRPO fine tuned model Reinforcement Fine-Tuning LLMs with GRPO	2	60	May 22, 2025

KL divergence or trust region?

Related topics