A doubt in Deep Q Learning

Dear Administrator,

Could you please guide me on this issue?

“we can estimate the action-value function iteratively by using the Bellman equation”,quoted from C3_W3_A1_Assignment, Section 6 - Deep Q-Learning

image

May i know the reason of labelling i and i+1 differently?

Thank you

The distinction between Q_i and Q_{i+1} reflects the iterative nature of the algorithm, where each update builds on the previous estimate.
Here Q_i represents the current estimate of the action-value function after the i-th iteration, and Q_{i+1} represents the updated estimate of the action-value function after applying the Bellman update rule to Q_i. The process iterates until Q_i converges to the true action-value function Q^*. You can find more detailed explanation here.

1 Like