C3_W3_Assignment1

Spearmint · December 3, 2022, 9:10am

Exercise 2: y_targets

y_targets = rewards + (gamma * max_qsa * (1 - done_vals))
yt_targets = rewards
** for i in range(done_vals.shape[0]):**
** if(done_vals[i]==0):**
** yt_targets[i]+=gamma*max_qsa[i]**

# Get the q_values.
q_values = q_network(states)
q_values = tf.gather_nd(q_values, tf.stack([tf.range(q_values.shape[0]),
                                            tf.cast(actions, tf.int32)], axis=1))

# Calculate the loss.
loss = MSE(y_targets, q_values)

return loss

Hi, I was using this piece of code above in place of the one provided in the hints. On printing the values they both happen to have the same value. But it unfortunately does not happen to have the same MSE. Can someone please correct me where I’mgoing wrong

AbdElRhaman_Fakhry · December 3, 2022, 6:55pm

Hi @Spearmint

you wouldn’t do that yt_targets = rewards
** for i in range(done_vals.shape[0]):**
** if(done_vals[i]==0):**
** yt_targets[i]+=gamma*max_qsa[i]**
why you do that as it is not correct ?
you just be content with y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) to compute y_targets

Thanks,
Abdelrahman

Spearmint · December 3, 2022, 9:26pm

hi could you explain what was wrong here?
yt_targets = rewards
for i in range(done_vals.shape[0]):
if(done_vals[i]==0):
yt_targets[i]+=gamma*max_qsa[i]

AbdElRhaman_Fakhry · December 4, 2022, 6:17pm

Hi @Spearmint

I didn’t Know why you write this code but if you want to compute y_targets when you do that yt_targets[i]+=gammamax_qsa[i] you didn’t add rewards like this code y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) …and you didn’t do if done_vals == 1 so the yt_targets[i] =rewards …and I thinks that you didn’t want to do += in this yt_targets[i]+=gammamax_qsa[i]

please feel free to ask any questions,
Thanks,
Abdelrahman

Topic		Replies	Views
Calcuting Y_targets in DQL in Reinforcement learning practice lab Unsupervised Learning, Recommenders, Reinforcement week-3	6	503	November 26, 2022
Test_compute_loss fails in my assignement Unsupervised Learning, Recommenders, Reinforcement week-3	4	493	March 10, 2023
Deep Q-Learning Algorithm with Experience Replay Unsupervised Learning, Recommenders, Reinforcement week-3	1	510	November 6, 2022
Week 3 programming c2 Unsupervised Learning, Recommenders, Reinforcement week-3	3	423	June 28, 2023
Week 3 assignment: MSE equation Advanced Learning Algorithms week-3	7	356	August 10, 2023

C3_W3_Assignment1

Related topics