Week 3 Reinforcement Learning, Quiz Optional

Ali_Nawab · January 3, 2023, 11:39am

I keep the probability loss = 0.1. I am not understanding Q(5, ->) = 18.52. someone clear my problem?

rmwkwok · January 3, 2023, 11:54am

What should it be if you set the misstep probability to zero? And why?

Ali_Nawab · January 3, 2023, 11:58am

when I kept probability loss = 0 then Q(5, ->) equals to 20

rmwkwok · January 3, 2023, 12:01pm

Why should it be 20?

rmwkwok · January 3, 2023, 12:28pm

Sorry I have to go very soon, it is 20 because it takes one step to get to the terminal in the right, and it gets discounted once, so we get 40*0.5 = 20.

As for the case of misstep probability equal 0.1, and that you start off in the 5th cell counting from the left, even if you decide that you should go to the right, there is a 10% chance that you will end up to the left and in such case it takes more than 1 step to get to a terminal (and consequently more discounts).

The program will simulate many such scenario, and there should be roughly 90% of time, it will move to the right immediately (yielding a perfect reward of 20 points), but 10% of time it will have to take a longer time for it to reach a terminal (yielding something less than 20 points). Because sometimes it takes a longer time, on average, the reward is smaller than the perfect case (20), and that imperfect reward turns out to be 18.52.

Cheers,
Raymond

Topic		Replies	Views
Bellman Equation with Misstep Prob Unsupervised Learning, Recommenders, Reinforcement week-3	1	356	January 12, 2024
State Action Value Function misstep_prob = 0.9999 favors low rewards Unsupervised Learning, Recommenders, Reinforcement week-3	1	477	August 10, 2022
Random Stochastic Environment Question Unsupervised Learning, Recommenders, Reinforcement week-3	2	488	August 8, 2022
State-Action value fails to find optimal policy Unsupervised Learning, Recommenders, Reinforcement week-3	2	520	January 29, 2023
Week 3 lecture video has error _ State_action Value Function definition Unsupervised Learning, Recommenders, Reinforcement week-3	5	450	August 13, 2024

Week 3 Reinforcement Learning, Quiz Optional

Related topics