DQNetwork Q function

flyunicorn · July 19, 2025, 7:25am

In supervised learning, we initialize parameter and then improve parameter through model training which is to minimize the gap between predicted y and true y. But in this DQNetwork, we initialize part of component of y which is maxQ(s prime, a prime). Then how do we improve y since part of y is a random guess?

Alireza_Saei · July 19, 2025, 8:39pm

Hi @flyunicorn

In DQNs, the target value y = r + \gamma \max_{a'} Q(s', a') includes \max Q(s', a') that is estimated using the current network or a target network. Initially this estimate may be inaccurate but it improves over time as the Q-network is trained. The key is that although part of y is based on a current approximation, training iteratively refines the Q-values through bootstrapping—using the network’s own improving predictions to update itself—that increases accurate targets as learning progresses.

Hope it helps! Feel free to ask if you need further assistance.

flyunicorn · July 20, 2025, 8:01am

Hi Alireza, you said “In DQNs, the target value y=r+γmaxa′Q(s′,a′) includes maxQ(s′,a′) that is estimated using the current network or a target network”, what is this current network? Is it the neural network below? It sounds like there are 2 neural network involved which I’m confused of the setup.

Alireza_Saei · July 20, 2025, 8:29am

Yes! The “current network” is indeed the neural network shown in the Deep Reinforcement Learning slide, it’s the one used to approximate Q(s, a) at each step. In many DQN setups (at least what I did in my recent project), there’s also a separate target network that is a copy of the current network but updated less frequently. This target network is used to compute the target value y = r + \gamma \max_{a'} Q_{\text{target}}(s', a'), making training more stable.

The target network helps us avoid the issue of chasing a moving target because it provides a more consistent and stable learning.

Topic		Replies	Views
Target Network Clarification Unsupervised Learning, Recommenders, Reinforcement week-module-3	3	830	July 10, 2023
Neural network on bellman equation Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	151	July 20, 2025
Moving Y target? Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	240	May 15, 2024
Question about DQN learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	143	July 13, 2024
Question about state value function learning algo Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	546	April 19, 2023

DQNetwork Q function

Related topics