When creating a post, please add:
- Week # must be added in the tags option of the post.
- Link to the classroom item you are referring to:
- Description (include relevant info but please do not post solution code or your entire notebook)
Link: https://www.coursera.org/learn/generative-ai-with-llms/lecture/1iZJO/optional-video-proximal-policy-optimization
I am going over the formula, I understand most of it but I cannot gauge how do we compute the Advantage? where does it comes from? I heard there is a recursive formula, can people go a bit further on this?
Also, where is this Value function? is this inside the LLM? how do we get it?