State Action value function [Coursera Video]

James_Yu1 · January 16, 2023, 9:17am

I just finished the video, yet confused by the state action value demonstrated by the professor.

I am wondering, if the current direction is right, should we change the Q(2,right) function as following, since the reward at the right side is 40.

Q(2, right) = 0 + 0*0.5^1 + 0*0.5^2 + 0*0.5^3 + 40*0.5^4

rmwkwok · January 16, 2023, 9:25am

Hello @James_Yu1,

The above is the only answer, because we are bounded by this requirement:

To behave optimally after moving to grid 3, we will have to keep moving left all the way to grid 1.

Cheers,
Raymond

rmwkwok · January 16, 2023, 9:31am

@James_Yu1 However, if you really choose to keep moving right, then you will move to grid 6, then you will finally be rewarded 40 points, however, the definition for Q won’t be changed. Q(2, right) is always 12.5.

ansonchantf · July 28, 2024, 4:39pm

Hi @rmwkwok ,

How does it know it behave optimally?

To behave optimally after moving to grid 3, we will have to keep moving left all the way to grid 1.

Isn’t it that we need to calculate rewards of all possibility move of every single action and then find the maximize value so that we know after moving to grid 3 should move keep moving right or move back to left?

rmwkwok · July 31, 2024, 1:24pm

Hello, @ansonchantf,

You have answered your own question! Generally, we will first need to calculate all state-action’s Q values before we know the absolutely best move. This slide does not show such steps, but since we are taking this very simple example, even just by inspection, we can still tell what the best action should be.

Cheers,
Raymond

Topic		Replies	Views
Week 3 lecture video has error _ State_action Value Function definition Unsupervised Learning, Recommenders, Reinforcement week-3	5	447	August 13, 2024
I think there is an error or is my understanding off Unsupervised Learning, Recommenders, Reinforcement week-3	9	567	February 25, 2023
State action value function for terminal states Unsupervised Learning, Recommenders, Reinforcement week-3	9	410	September 21, 2024
State-action value function example? Unsupervised Learning, Recommenders, Reinforcement week-3	8	592	September 9, 2022
C3_W3 Quiz (State-action value function) Question 2 Unsupervised Learning, Recommenders, Reinforcement week-3	2	322	March 5, 2024

State Action value function [Coursera Video]

Related topics