Confused about reniforcement learning

Y_L1 · March 26, 2024, 2:54am

I didn’t understand this slide in the presentation. He says to randomly initialize the network and obtain a guess of the value of Q(s,a). On the slide it says set Q=Q(new). Is Q the guess? So what is Q(new)? How do you get Q(new) value? What is ground value when training?

rmwkwok · March 26, 2024, 3:57am

Hello @Y_L1,

We randomly initialize a Q, and consider it as the latest version of Q.

Then, in your screenshot there is an equation for y, right? Use that equation to compute y. That equation uses our latest Q (which can be an randomly initialized one).

We use the constructed x and computed y to train our latest Q and it will end up in a Q_new.

We use the Q_new as the new latest Q.

The idea is we don’t worry that the Q was wrong initially, and then we rely on this whole repeated training process to get Q finally closer and closer to a good Q. You can see for yourself that this process can work in the week’s assignment.

Cheers,
Raymond

Topic		Replies	Views
Reinforcement learning - inizialization of Q Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	550	February 15, 2023
Deep Reinforcement Learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	499	January 2, 2023
How does the Q-Learning Algorithm actually learn? Unsupervised Learning, Recommenders, Reinforcement week-module-3	18	555	December 5, 2023
Q function training Unsupervised Learning, Recommenders, Reinforcement week-module-3	3	27	August 20, 2024
Confusion on Target Variable Deep Reinforcement Unsupervised Learning, Recommenders, Reinforcement week-module-3	28	937	September 15, 2022

Confused about reniforcement learning

Related topics