Exercise 2: y_targets

y_targets = rewards + (gamma * max_qsa * (1 - done_vals))

**yt_targets = rewards**

** for i in range(done_vals.shape[0]):**

** if(done_vals[i]==0):**

** yt_targets[i]+=gamma*max_qsa[i]**

```
# Get the q_values.
q_values = q_network(states)
q_values = tf.gather_nd(q_values, tf.stack([tf.range(q_values.shape[0]),
tf.cast(actions, tf.int32)], axis=1))
# Calculate the loss.
loss = MSE(y_targets, q_values)
return loss
```

Hi, I was using this piece of code above in place of the one provided in the hints. On printing the values they both happen to have the same value. But it unfortunately does not happen to have the same MSE. Can someone please correct me where I’mgoing wrong

Hi @Spearmint

you wouldn’t do that **yt_targets = rewards**

** for i in range(done_vals.shape[0]):**

** if(done_vals[i]==0):**

** yt_targets[i]+=gamma*max_qsa[i]**

why you do that as it is not correct ?

you just be content with y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) to compute y_targets

Thanks,

Abdelrahman

hi could you explain what was wrong here?

yt_targets = rewards

for i in range(done_vals.shape[0]):

if(done_vals[i]==0):

yt_targets[i]+=gamma*max_qsa[i]

Hi @Spearmint

I didn’t Know why you write this code but if you want to compute y_targets when you do that yt_targets[i]+=gamma*max_qsa[i] you didn’t add rewards like this code y_targets = rewards + (gamma * max_qsa * (1 - done_vals)) …and you didn’t do if done_vals == 1 so the yt_targets[i] =rewards …and I thinks that you didn’t want to do += in this yt_targets[i]+=gamma*max_qsa[i]

please feel free to ask any questions,

Thanks,

Abdelrahman