When creating a post, please add:
- Week # must be added in the tags option of the post.
- Link to the classroom item you are referring to: https://www.coursera.org/learn/generative-ai-with-llms/lecture/yPzI4/lab-3-walkthrough 3
- Description (include relevant info but please do not post solution code or your entire notebook):
The PPO algorithm uses clipping to control the policy from changing excessively.
And KL divergence also plays a role related to policy.
Why should use KL divergence when PPO uses clipping?