Just wondering why, if we have one-hot encoding of all actions, we need a separate “do nothing” state rather than representing a “do nothing” action as just zero in all other actions?
Hi @iokleiser can you give more context on the question. Are you referring to a lecture?
Yes, for exmaple in Course 3 > Module 3 > Continuous state space > Video: Algorithm refinement: Improved neural network architecture, time stamp 1:10. There is a depiction of the different possible decisions Q(s, nothing), Q(s, left), Q(s, main), and Q(s, right). Presumably the lander could choose to fire multiple thrusters at the same time (for example, fire left and fire main at the same time). But the action of doing nothing could be captured in just having zero for all other actions, right? Is it not redundant?
You might be able to do without “do nothing” as a specific state in this example. But a “do nothing” action is probably not going to be applicable to all systems.
Using one-hot coding is the standard method for handing multiple logical inputs, so the general solution is to have a one-hot variable for each action. In this case, one of them happens to be “fire no thrusters in this epoch”.
Yes, I think in this specific context, a “do nothing” feature can provide more flexibility and clarity to what’s going on.