Why use KL divergence in PPO?

Goomin · June 3, 2024, 5:29am

When creating a post, please add:

Week # must be added in the tags option of the post.
Link to the classroom item you are referring to: https://www.coursera.org/learn/generative-ai-with-llms/lecture/yPzI4/lab-3-walkthrough 3
Description (include relevant info but please do not post solution code or your entire notebook):

The PPO algorithm uses clipping to control the policy from changing excessively.

And KL divergence also plays a role related to policy.

Why should use KL divergence when PPO uses clipping?

gent.spah · July 16, 2024, 8:34am

Hello, please check this post also if you may:

Topic		Replies	Views
KL divergence or trust region? Generative AI with Large Language Models week-3	7	55	July 15, 2024
Trust region in the PPO equation and KL divergence GenAI with LLMs Resources	2	435	October 19, 2023
I have a question about the content of the lecture Generative AI with Large Language Models week-3	0	405	August 14, 2023
Lab 3 Qualitative Evaluation of PPO model; wonky results Generative AI with Large Language Models week-3	1	443	July 24, 2023
PPO, how do we compute de advantage and the Value Function?/ Generative AI with Large Language Models	0	293	December 30, 2023