KL divergence or trust region?

KL divergence compares old LLM model to new fine tuned LLM model!

Trust region is movement allowed for PPO model!

The ultimate goal is the same, what happening are different things!

Why do we need it? Well it seems the experts who have tested it they needed it! Is it complex? It is, but the whole RLHF, PPO fine tuned LLM system is a complex system with lots of tunings happening!

1 Like