i have read various blogposts and explanations for DQN convergence and i still don’t get how does it converge starting from random examples using just the rewards, i need some detailed explanation for this problem please.
i have read various blogposts and explanations for DQN convergence and i still don’t get how does it converge starting from random examples using just the rewards, i need some detailed explanation for this problem please.