Deep Q-Learning Algorithm with Experience Replay

ako · November 5, 2022, 9:39am

Hello everyone
I’am actually a little bit confused about the implementation of the DQN algorithm, I understand all the steps of the algorithm but there’s somthing confusing me which is how we get to calculate the targets y :

# Unpack the mini-batch of experience tuples.
    states, actions, rewards, next_states, done_vals = experiences
    
    # Compute max Q^(s,a).
    max_qsa = tf.reduce_max(target_q_network(next_states), axis=-1)
    
    # Set y = R if episode terminates, otherwise set y = R + γ max Q^(s,a).

How does this exactly works ? and how we’ve managed to compute max_qsa ?

rmwkwok · November 6, 2022, 12:18am

Hello @ako,

This line has 2 parts:

 max_qsa = tf.reduce_max(
    target_q_network(next_states), 
    axis=-1
)

namely a call to target_q_network and tf.reduce_max.

I assume you know what target_q_network is because it is implemented by you, and by call such neural network like a function with an input next_states, it does a forward propagation to compute the output of the neural network, which is the Q(s,a) of all possible actions.

I suggest you to run

qsa = target_q_network(next_states)

print(next_states.shape)
print(qsa.shape)
print(next_states)
print(qsa)

to check out yourself what they look like.

As for tf.reduce_max, please check out its documentation for examples and explanations.

If you still have questions, please share with me your understanding so that I know what’s unclear ;).

Cheers,
Raymond

PS: I am removing the part for the exercise since sharing assignment code isn’t allowed.

Topic		Replies	Views
Unsupervised Learning : Week3 : Learning the state-value function Unsupervised Learning, Recommenders, Reinforcement week-3	7	465	November 3, 2023
Question about state value function learning algo Unsupervised Learning, Recommenders, Reinforcement week-3	4	520	April 19, 2023
C3_W3_Assignment1 Unsupervised Learning, Recommenders, Reinforcement week-3	3	545	December 4, 2022
Calcuting Y_targets in DQL in Reinforcement learning practice lab Unsupervised Learning, Recommenders, Reinforcement week-3	6	503	November 26, 2022
Confusion on Target Variable Deep Reinforcement Unsupervised Learning, Recommenders, Reinforcement week-3	28	933	September 15, 2022

Deep Q-Learning Algorithm with Experience Replay

Related topics