I am confused on this slide for Q(s,a) for week 3 slide number 20

Hello,

I am quite confused on State-action value function definition video at 5:04 specifically at Q(s,a).
From my understanding Q(2, →) should be = 0 + (0.5) (0) + (0.5^2) ( 0 )+ (0.5^3) (40). Why would it be 0 + (0.5) (0) + (0.5^2) ( 0 )+ (0.5^3) (100)?

If you evaluate Q(2,→):
You are asking, “If I start in state 2 and take action →, what discounted return do I get?”
Following that path, you end up in the terminal state with 10

If you evaluate Q(2,←):
You’d end up in the terminal state with 40 instead.

Hello Gent,

I don’t understand the path movement.

The slide in my question is here. I am confused on calculation for Q(2, →) as we are moving to the right shouldn’t we are moving toward 40? I don’t get how can we move to 100 with 3 zero. I understand Q(2, ←) [1 zero at state 2 and 0.5 at state 1] and Q(4, ←) [1 zero at state 4 + 1 zero at state 3 + 1 zero at state 2 + 100 at state 1].

So for Q(2, →) shouldn’t we look at state 2, state 3, state 4, state 5, and state 6?

Hello Hoang @hoang51,

The key is we should keep the following definition in mind:

Therefore, Q(2, ->) does not mean we keep moving right all the way. The “->” only tells us to move right once, and then it should behave optimally after that, which is why it would turn around and move left to “100”.

Cheers,
Raymond

2 Likes