Week3, State action value module. Tie breakers for returns!

Hi all,
I was playing with the lab to see how actions are getting altered with respect to reward and discount factor(I chose 0.5). How does one choose the action when there’s a tie in return? Here in the below image, it takes left. Is it because there’s a logic that checks for least number of steps if there’s a tie?
How do we typically deal with tie breakers? Do we add any rules based on physics of the problem that we are dealing with?

Thanks,
Hari

Hi @Hari_Krishnan_94,

Many RL libraries or environments loop through the action list in a fixed order and pick the first one with the max value. If you suspect that it’s choosing the action that leads to fewer steps, then you can include additional logic like shortest path or minimal cost-to-go if the environment allows that information.

1 Like

No.

If it’s a tie, it doesn’t matter which you chose.

1 Like