When Mr. Ng says there are 10,000 training examples of x,y pairs, where does the value of y(i) come from? When training what is the gorund truth y(i) and the predicted y(i)?
Y(i) is the second value of the x, y pair. Y(i) is the ground truth and whatever the model predicts is the predictions! Keep one going through the video maybe repeat them again!
I meant how do you obtain the y ground truth value (the Q(s’,a’) is not known). I didn’t understand this slide in the presentation. He says to randomly initialize the network and obtain a guess of the value of Q(s,a). Would this guess be the ground truth value used during training. Since the network produces this ground truth value, if you train with this ground truth value, you would have a zero loss. And, learning (updating the parameters) does not take place, when there is a zero loss. On the slide it says set Q=Q(new). Is Q the guess? So what is Q(new)? How do you get Q(new) value?