Exercise 2: y_targets
y_targets = rewards + (gamma * max_qsa * (1 - done_vals))
yt_targets = rewards
** for i in range(done_vals.shape[0]):**
** if(done_vals[i]==0):**
** yt_targets[i]+=gamma*max_qsa[i]**
# Get the q_values.
q_values = q_network(states)
q_values = tf.gather_nd(q_values, tf.stack([tf.range(q_values.shape[0]),
tf.cast(actions, tf.int32)], axis=1))
# Calculate the loss.
loss = MSE(y_targets, q_values)
return loss
Hi, I was using this piece of code above in place of the one provided in the hints. On printing the values they both happen to have the same value. But it unfortunately does not happen to have the same MSE. Can someone please correct me where I’mgoing wrong
Hi @Spearmint
you wouldn’t do that yt_targets = rewards
** for i in range(done_vals.shape[0]):**
** if(done_vals[i]==0):**
** yt_targets[i]+=gamma*max_qsa[i]**
why you do that as it is not correct ?
you just be content with y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) to compute y_targets
Thanks,
Abdelrahman
hi could you explain what was wrong here?
yt_targets = rewards
for i in range(done_vals.shape[0]):
if(done_vals[i]==0):
yt_targets[i]+=gamma*max_qsa[i]
Hi @Spearmint
I didn’t Know why you write this code but if you want to compute y_targets when you do that yt_targets[i]+=gammamax_qsa[i] you didn’t add rewards like this code y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) …and you didn’t do if done_vals == 1 so the yt_targets[i] =rewards …and I thinks that you didn’t want to do += in this yt_targets[i]+=gammamax_qsa[i]
please feel free to ask any questions,
Thanks,
Abdelrahman