State-action value function example question

Hello @Mo_Okasha,

Perhaps try to re-run all the cells after setting 0 for misstep_prob?

I did the tests myself with misstep_prob be 0 or 0.4. I can reproduce your result with 0.4, and as for misstep_prob = 0, the LEFT value in state s is always half the RIGHT value in state s-1 because every time it takes the LEFT action in state s and reach s-1, it will then turn to go RIGHT to achieve the biggest value. It is “halved” because of the gamma, it will turn RIGHT because Q is defined as followed:

My screenshots below:

misstep_prob = 0

misstep_prob = 0.4

Raymond