Deep Reinforcement Learning

Radouane_BEY_OMAR · January 1, 2023, 10:34pm

Hi , in the DQN first we initialize the parameters randomly to guess the Q(s,a) ,
then after guessing it we will use it as Y , to train the Qnew on ,
since we first initliaze the Q randomly , it is quite possbile that Y we got is completly wrong so ,
we train on a wrong dataset ?

rmwkwok · January 2, 2023, 1:27am

Hello @Radouane_BEY_OMAR,

In short, besides the input from DQN which is random guess initially, we also have the true input from the environment which is the Reward. Note that in the assignment of the week, we have the following equation

The overall idea here is that, we start with some random Q Network (QN) and random Target Q Network (TQN) which are incorrect at first, but through continuous injection of the correct information of Reward from the environment, the hope is that our QN and TQN will both approach to be useful if not to be the truth.

It might be difficult to accept this because we have always been told that a supervised learning is based on correct labels, but this is reinforcement learning, and it is different because we do not have any label in advance, but signals (or Rewards) from the environment that we learn as we go.

The week’s assignment is a very good starting point for you to build up some belief in such approach. You will see for yourself how a lunar lander starts from some random networks and eventually learns to land properly.

Happy New Year, and cheers,
Raymond

Topic		Replies	Views
Confusion on Target Variable Deep Reinforcement Unsupervised Learning, Recommenders, Reinforcement week-3	28	926	September 15, 2022
Confused about reniforcement learning Unsupervised Learning, Recommenders, Reinforcement week-3	1	224	March 26, 2024
Question about DQN learning Unsupervised Learning, Recommenders, Reinforcement week-3	9	91	July 13, 2024
Reinforcement Learning Unsupervised Learning, Recommenders, Reinforcement week-3	1	71	July 1, 2024
Reinforcement learning - inizialization of Q Unsupervised Learning, Recommenders, Reinforcement week-3	9	541	February 15, 2023

Deep Reinforcement Learning

Related topics