I am confused on this slide for Q(s,a) for week 3 slide number 20

hoang51 · August 31, 2025, 4:51am

Hello,

I am quite confused on State-action value function definition video at 5:04 specifically at Q(s,a).
From my understanding Q(2, →) should be = 0 + (0.5) (0) + (0.5^2) ( 0 )+ (0.5^3) (40). Why would it be 0 + (0.5) (0) + (0.5^2) ( 0 )+ (0.5^3) (100)?

gent.spah · August 31, 2025, 9:22am

If you evaluate Q(2,→):
You are asking, “If I start in state 2 and take action →, what discounted return do I get?”
Following that path, you end up in the terminal state with 10

If you evaluate Q(2,←):
You’d end up in the terminal state with 40 instead.

hoang51 · September 1, 2025, 7:20am

Hello Gent,

I don’t understand the path movement.

The slide in my question is here. I am confused on calculation for Q(2, →) as we are moving to the right shouldn’t we are moving toward 40? I don’t get how can we move to 100 with 3 zero. I understand Q(2, ←) [1 zero at state 2 and 0.5 at state 1] and Q(4, ←) [1 zero at state 4 + 1 zero at state 3 + 1 zero at state 2 + 100 at state 1].

So for Q(2, →) shouldn’t we look at state 2, state 3, state 4, state 5, and state 6?

rmwkwok · September 2, 2025, 1:54am

Hello Hoang @hoang51,

The key is we should keep the following definition in mind:

Therefore, Q(2, ->) does not mean we keep moving right all the way. The “->” only tells us to move right once, and then it should behave optimally after that, which is why it would turn around and move left to “100”.

Cheers,
Raymond

Topic		Replies	Views
Week 3 lecture video has error _ State_action Value Function definition Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	466	August 13, 2024
State-action value function example? Unsupervised Learning, Recommenders, Reinforcement week-module-3	8	623	September 9, 2022
State Action value function [Coursera Video] Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	496	July 31, 2024
Error in State-action value quiz Unsupervised Learning, Recommenders, Reinforcement week-module-3	8	587	June 11, 2024
I think there is an error or is my understanding off Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	609	February 25, 2023

I am confused on this slide for Q(s,a) for week 3 slide number 20

Related topics