I have just finished the Machine Learning specialization and was pretty interested in Reinforcement Learning. As to get a better grasp on the concepts and try to apply it on practice, I tried to write my version of DQN, but I’m not so sure that I got it right, even though it’s “working”.
Your code seems solid, and if it’s working, that’s great! However, you might reconsider some parts of your code for better performance (e.g. epsilon rate decay, using better data structures, Hyper-parameter tuning, etc.)
About working, well… not so good. I’ve made a lot of enhancements, specially about performance. Now i’m approaching the training with a vectorized way, and handling the Experience Replay without array functions like “pop()”, which gave-me A LOT more performance, but I still can’t make it converge and have a satisfatory result.
Great progress so far! First of all, consider adding current_state = current_state.reshape(1, -1) at the beginning of your compute_action function to make the dimension correct.
For optimizing hyperparameters, pay attention to epsilon and its decay rate. You can experiment with various values to find the best combination. You might also explore alternative formulas or strategies for epsilon decay.
That’s all I can think of for now! Let me know if you need further assistance or feedback.
I’ve been twerking on some stuff , and could make it have a better performance after doubling the episodes for training. Since that I’m trying to change the Neural Network architecture and play with more episodes to see what happens.
I will try your tip on the next run for sure! Thank you for the answears!!