Why Q(s,a) and not "a" directly?

Hey!
Trying to understand why we apply NN to obtain the Q(s,a) for a given state and action instead of giving it a state and getting the right action. I know this was mentioned in the course but couldn’t quite get it, so I’d like to get a deeper understanding.

Because if there are lots of possible states, it isn’t possible to know in advance what the best action will be in all situations.

So we create a NN to learn it for us.

1 Like

Trying to understand your answer, do we need a NN because there are a lot of possible states or is it because we don’t know the correct choice in advance and we need to learn it?

And building on the question posed, do we need a NN because this is a continuous space? What if we had a discrete space that had several low-cardinality inputs? Several high-cardinality inputs?

Both.

Sorry, I don’t understand your 2nd question.

1 Like

Thanks for explaining the first part.

For my second question I wrote, “And building on the question posed, do we need a NN because this is a continuous space? What if we had a discrete space that had several low-cardinality inputs? Several high-cardinality inputs?”

Let me break that down.

  1. Do we need NN because we have a continuous space?
  2. Would we choose a NN if we had few inputs with few values (let’s say 3 categorical inputs, with 4-6 values each)?

Not sure what you mean by “continuous space”.

Yes, you can use an NN with categorical features. One-hot coding is typically used.

1 Like

Hello @atabhatti,

We don’t need NN because we have a continuous input space. NN is not the only approach that can model a continuous input space. We need a certain approach, be it NN or not, because it performs well.

NN can handle discrete input space. Three of the possibilities below:

  1. like Tom said, we one-hot encode a discrete input into several continuous inputs (though each of them just takes either 0 or 1 as value)

  2. like words (which are discrete) in LLM, we give each discrete value (word) an embedding

  3. leave the discrete input as is if it is also ordinal (instead of being just categorical).

In the assignment for reinforcement learning, we DO have discrete inputs but since they are booleans, we can treat them with (3). The four actions ARE discrete but they are not considered inputs, instead they are the four outputs of the model - which is another nice way to treat them.

Cheers,
Raymond