C3_W3_Assignment1

Exercise 2: y_targets

y_targets = rewards + (gamma * max_qsa * (1 - done_vals))
yt_targets = rewards
** for i in range(done_vals.shape[0]):**
** if(done_vals[i]==0):**
** yt_targets[i]+=gamma*max_qsa[i]**

# Get the q_values.
q_values = q_network(states)
q_values = tf.gather_nd(q_values, tf.stack([tf.range(q_values.shape[0]),
                                            tf.cast(actions, tf.int32)], axis=1))

# Calculate the loss.
loss = MSE(y_targets, q_values)

return loss

Hi, I was using this piece of code above in place of the one provided in the hints. On printing the values they both happen to have the same value. But it unfortunately does not happen to have the same MSE. Can someone please correct me where I’mgoing wrong

Hi @Spearmint

you wouldn’t do that yt_targets = rewards
** for i in range(done_vals.shape[0]):**
** if(done_vals[i]==0):**
** yt_targets[i]+=gamma*max_qsa[i]**
why you do that as it is not correct ?
you just be content with y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) to compute y_targets

Thanks,
Abdelrahman

hi could you explain what was wrong here?
yt_targets = rewards
for i in range(done_vals.shape[0]):
if(done_vals[i]==0):
yt_targets[i]+=gamma*max_qsa[i]

Hi @Spearmint

I didn’t Know why you write this code but if you want to compute y_targets when you do that yt_targets[i]+=gammamax_qsa[i] you didn’t add rewards like this code y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) …and you didn’t do if done_vals == 1 so the yt_targets[i] =rewards …and I thinks that you didn’t want to do += in this yt_targets[i]+=gammamax_qsa[i]

please feel free to ask any questions,
Thanks,
Abdelrahman