Learning the Q function questions

Hello Sharon @sdabach ,

Since w and b are randomly initialized, we can say Q-max was a pure guess at the beginning, but R, the reward, is a true information, therefore, I would not say y is a pure guess. For your question, please check out this response of mine which was, essentially, to the same question.

Please see the footnote in the first screenshot of my last reply to this topic. The screenshot came from the slide PDF file which actually contains the name of the paper, and I guess it was somehow edited away from the video.

Cheers,
Raymond