UPDATE… So I finally built a model that was submitted for race qualification and it came in 3rd fastest. The top ten models were entered into the physical race which occurred yesterday. I was unable to attend the event due to geographical location so someone raced my car for me. My car was runner-up based on time trials so a single vehicle on the track at any one time with the fastest lap of 3 taken as the performance metric.
The feedback from the driver was the vehicle rode really smoothly.
During training, I tried experimenting with different reward functions, varying hyperparameters and different training timeframes. The model that I submitted for the race was trained in 1 hour using a single factor in the reward function and using default hyperparameter settings. (N.B. each entrant was allocated 50 hours compute time on AWS DeepRacer server).
Given the only sensor on the vehicle is the camera, all computer vision serves as input to the model’s reward function. There are 9 parameters generated from the computer vision that you can work with.
In our race, there were no other vehicles on the track so object detection/avoidance was not required, just an ability to detect the markings of the track and the x, y coordinates, distance from track centreline, heading, steering angle, speed of the vehicle, progress around the track, etc.
I wanted to dig deeper into the CNN architecture but that didn’t seem possible. I did adjust the learning rate but that didn’t seem to improve things too much within a 3 hour training window. With more time and several hundred hours of compute for training then I think I could have improved performance no doubt.
While my model achieved a fastest lap of 11 secs, the previous champion trained a model that went around in just over 7 secs. He competes at AWS DeepRacer events around teh world and trains his models on his own compute platform at home. Allegedly he consumed ~30hours compute training this model. This person didn’t enter the race but participated just to demonstrate the performance that can be achieved by a pro - impressive stuff!
One thing that seemed to detract from my model’s performance in the virtual environment was imposition of speed and steering angle reward/penalty. I devised a “reward surface” which in practice was quite hopeless. Also the temptation to devise an all singing, all dancing reward function can be counter productive in my view. The best approach I believe is to test individual factors in isolation, retain the best and then progressively augment that model with additional value-adding factors. Again, 50 hours compute timeframe isn’t much so thinking about what to test and executing against that plan whilst being able to pivot based on your discoveries worked for me, although I didn’t actually complete all of my intended experiments.
Hope this is of interest and useful for any future DeepRacer participants within this community.