# Deep Q-Learning Algorithm with Experience Replay

Hello everyone
I’am actually a little bit confused about the implementation of the DQN algorithm, I understand all the steps of the algorithm but there’s somthing confusing me which is how we get to calculate the targets y :

``````# Unpack the mini-batch of experience tuples.
states, actions, rewards, next_states, done_vals = experiences

# Compute max Q^(s,a).
max_qsa = tf.reduce_max(target_q_network(next_states), axis=-1)

# Set y = R if episode terminates, otherwise set y = R + γ max Q^(s,a).
``````

How does this exactly works ? and how we’ve managed to compute max_qsa ?

Hello @ako,

This line has 2 parts:

`````` max_qsa = tf.reduce_max(
target_q_network(next_states),
axis=-1
)
``````

namely a call to `target_q_network` and `tf.reduce_max`.

I assume you know what `target_q_network` is because it is implemented by you, and by call such neural network like a function with an input `next_states`, it does a forward propagation to compute the output of the neural network, which is the Q(s,a) of all possible actions.

I suggest you to run

``````qsa = target_q_network(next_states)

print(next_states.shape)
print(qsa.shape)
print(next_states)
print(qsa)
``````

to check out yourself what they look like.

As for `tf.reduce_max`, please check out its documentation for examples and explanations.

If you still have questions, please share with me your understanding so that I know what’s unclear ;).

Cheers,
Raymond

PS: I am removing the part for the exercise since sharing assignment code isn’t allowed.