Week 3 lecture video has error _ State_action Value Function definition

sammiao914 · May 31, 2023, 2:36am

for the action point to the right at state 2. It says 50 , 12.5 but it should have been 50, 3.125
because its 0.5^4 * 40 but i believe professor accidently used 100 instead of 40

Mujassim_Jamal · May 31, 2023, 4:00am

You have to follow the optimal path (e.g. that arrows). So from state 2, you end up in state 3, but due to the optimal direction, you again end up in state 2 and then state 1.

Therefore, it would be:

0 \times 0.5 + 0 \times 0.5^2 + 100 \times 0.5^3 = 12.5

rmwkwok · May 31, 2023, 4:56am

I think @Mujassim_Jamal meant to say finally ending up in state 1 (instead of 3).

Mujassim_Jamal · May 31, 2023, 4:58am

Oh Sorry, i didn’t notice that. Edited my response. Thanks @rmwkwok for pointing out !

manhtdxxx · August 12, 2024, 1:44pm

Why at state 1, take action to the right give you 100?. It must 100 + 0 * 0.5 + 100 * 0.5^2 = 125. I don’t understand

rmwkwok · August 13, 2024, 1:17am

The slide in the first post of thsi thread does not say that taking the right action at state 1 has a Q value of 100. It is simply saying that the reward arriving at state 1 is 100. Note the difference between Q-value and Reward which is important because the formula in your question is one that calculates a Q-value.

In fact, state 1 and 6 are terminal states where we won’t take any more action, in other words, we won’t consider the case that goes from state 1 to state 2 and then back to state 1. As long as we are in state 1, that’s the end, no more action and no more going to state 2.

Cheers,
Raymond

Topic		Replies	Views
Error in State-action value quiz Unsupervised Learning, Recommenders, Reinforcement week-module-3	8	541	June 11, 2024
State Action value function [Coursera Video] Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	490	July 31, 2024
State-action value function example? Unsupervised Learning, Recommenders, Reinforcement week-module-3	8	598	September 9, 2022
State action value function for terminal states Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	422	September 21, 2024
I think there is an error or is my understanding off Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	575	February 25, 2023

Week 3 lecture video has error _ State_action Value Function definition

Related topics