Why Q(s,a) and not "a" directly?

jjdniz · May 12, 2024, 1:45am

Hey!
Trying to understand why we apply NN to obtain the Q(s,a) for a given state and action instead of giving it a state and getting the right action. I know this was mentioned in the course but couldn’t quite get it, so I’d like to get a deeper understanding.

TMosh · May 12, 2024, 2:43am

Because if there are lots of possible states, it isn’t possible to know in advance what the best action will be in all situations.

So we create a NN to learn it for us.

atabhatti · May 29, 2024, 4:53pm

Trying to understand your answer, do we need a NN because there are a lot of possible states or is it because we don’t know the correct choice in advance and we need to learn it?

And building on the question posed, do we need a NN because this is a continuous space? What if we had a discrete space that had several low-cardinality inputs? Several high-cardinality inputs?

TMosh · May 29, 2024, 5:15pm

Both.

Sorry, I don’t understand your 2nd question.

atabhatti · May 29, 2024, 5:34pm

Thanks for explaining the first part.

For my second question I wrote, “And building on the question posed, do we need a NN because this is a continuous space? What if we had a discrete space that had several low-cardinality inputs? Several high-cardinality inputs?”

Let me break that down.

Do we need NN because we have a continuous space?
Would we choose a NN if we had few inputs with few values (let’s say 3 categorical inputs, with 4-6 values each)?

TMosh · May 29, 2024, 5:36pm

Not sure what you mean by “continuous space”.

Yes, you can use an NN with categorical features. One-hot coding is typically used.

rmwkwok · June 1, 2024, 12:14am

Hello @atabhatti,

We don’t need NN because we have a continuous input space. NN is not the only approach that can model a continuous input space. We need a certain approach, be it NN or not, because it performs well.

NN can handle discrete input space. Three of the possibilities below:

like Tom said, we one-hot encode a discrete input into several continuous inputs (though each of them just takes either 0 or 1 as value)
like words (which are discrete) in LLM, we give each discrete value (word) an embedding
leave the discrete input as is if it is also ordinal (instead of being just categorical).

In the assignment for reinforcement learning, we DO have discrete inputs but since they are booleans, we can treat them with (3). The four actions ARE discrete but they are not considered inputs, instead they are the four outputs of the model - which is another nice way to treat them.

Cheers,
Raymond

Topic		Replies	Views
Max Q(s',a') for continuous state spaces Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	493	April 14, 2023
DQN vs Q-Function Unsupervised Learning, Recommenders, Reinforcement week-module-3	6	545	August 8, 2022
Input to DQN in reinforcement learning Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	485	May 16, 2023
State and Action as Input vs State as Input and Q Values as Output Unsupervised Learning, Recommenders, Reinforcement week-module-3	2	286	March 17, 2024
Don't understand why we use q_netword & target_q_network Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	357	September 19, 2023

Why Q(s,a) and not "a" directly?

Related topics