State-Action value fails to find optimal policy

Hello @dmokran,

If you also look at the bottom chart, you will see that going left and going right are equally well, and it is just the code’s behavior that going left will be considered first.

Raymond

1 Like