Having trouble understanding how DQN converges

i have read various blogposts and explanations for DQN convergence and i still don’t get how does it converge starting from random examples using just the rewards, i need some detailed explanation for this problem please.