How to understand the output values of improved algorithm

Reference Course 3, Week 3, Algorithm refinement: Improved neural network architecture
(https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning/lecture/hpmUe/algorithm-refinement-improved-neural-network-architecture)

When the deep learning algorithm is improved to now output 4 values for each possible action, how do we know which output value corresponds with which action?

Can you give a time mark within that lecture?

@1:12

@Thomas_C1, this works the same as other neural networks with multiple outputs do - before you train, you decide which output value you want to correspond to which action. The key is that when you are defining the target values, y, for each of the output nodes, you need to choose the y values that correspond to your chosen action for that node.

The part that’s unique to reinforcement learning is how do you come up with the target values, y. Very roughly, you just try a bunch of example actions to see the results, and then use the Bellman equation to calculate target y values. For a much better, more detailed, description, watch the previous lecture, “Learning the state-value function”