PPO, how do we compute de advantage and the Value Function?/

Zildjian240 · December 30, 2023, 7:04pm

When creating a post, please add:

Week # must be added in the tags option of the post.
Link to the classroom item you are referring to:
Description (include relevant info but please do not post solution code or your entire notebook)
Link: https://www.coursera.org/learn/generative-ai-with-llms/lecture/1iZJO/optional-video-proximal-policy-optimization

I am going over the formula, I understand most of it but I cannot gauge how do we compute the Advantage? where does it comes from? I heard there is a recursive formula, can people go a bit further on this?

Also, where is this Value function? is this inside the LLM? how do we get it?

Topic		Replies	Views
I have a question about the content of the lecture Generative AI with Large Language Models week-3	0	405	August 14, 2023
PPO Fine-tune Metrics Generative AI with Large Language Models week-3	1	429	July 19, 2023
Critic model in PPO(Proximal policy Optimization) GenAI with LLMs Resources	1	172	February 5, 2025
There might be one error regarding the loss function in the slice on page 21 Generative AI with Large Language Models week-3	1	413	August 8, 2023
Assignment Building NN C1 Week 4 Neural Networks and Deep Learning	11	621	August 16, 2022

PPO, how do we compute de advantage and the Value Function?/

Related topics