The question reads:
You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ← ) = -10, Q(s, → ) =-20, Q(s, STOP ) = 0.
What is the optimal action to take in state s?
I see what it’s asking and have no problem with the answer. However, I do think we can specify what “optimal” means here, more specifically, whether we are optimizing the reward of one action or the total return.