I didn’t understand this slide in the presentation. He says to randomly initialize the network and obtain a guess of the value of Q(s,a). On the slide it says set Q=Q(new). Is Q the guess? So what is Q(new)? How do you get Q(new) value? What is ground value when training?
Hello @Y_L1,
We randomly initialize a Q, and consider it as the latest version of Q.
Then, in your screenshot there is an equation for y, right? Use that equation to compute y. That equation uses our latest Q (which can be an randomly initialized one).
We use the constructed x and computed y to train our latest Q and it will end up in a Q_new.
We use the Q_new as the new latest Q.
The idea is we don’t worry that the Q was wrong initially, and then we rely on this whole repeated training process to get Q finally closer and closer to a good Q. You can see for yourself that this process can work in the week’s assignment.
Cheers,
Raymond
1 Like