Selective backpropagation

Hello! I am still in the early days of my nn adventures and I am curious if anyone here could help me understand if my intuition about how to solve a problem is correct - and also, if anyone could point me at any resources, paper that have already dealt with this… all the better.

I am doing some explorations into using neural networks with reinforcement learning tasks and am currently building a Deep Q-Learning network to play a game (seems like a good way to learn).

In each new state, the network is fed an observation from the game as an input and outputs all possible (discrete) actions as a set of linear nodes (no activation function) that represent the expected reward (score) based on those actions. The system then decides on an action and then I compute the loss by comparing the expected reward to the actual reward… and it’s the next step I am confused about - should I attempt to selectively backpropagate each output node that represents an action/reward at a time (since I am only taking one action at a time)… Or am I missing some essential truth about generalized nn learning here?

If I understand right, first of all you dont do the back-propagation the model does it when built.

It will need to do it for all its outputs because it has to compare all the results at the same time so to determine its best state!