Why Q(s,a) and not "a" directly?

Hey!
Trying to understand why we apply NN to obtain the Q(s,a) for a given state and action instead of giving it a state and getting the right action. I know this was mentioned in the course but couldn’t quite get it, so I’d like to get a deeper understanding.

Because if there are lots of possible states, it isn’t possible to know in advance what the best action will be in all situations.

So we create a NN to learn it for us.

1 Like