Help needed: Breakout using DeepQ learning fails to converge in a reasonable timeframe

Restarted yesterday night, and the improvement of running average Reward looks accelerating now:

My CPU is giving me 100,000 timesteps per hour, so it’s going to take 4 days to run 10M steps as stated in the paper, but I will see how it goes and decide…

Cheers,
Raymond