AWS DeepRacer

HI All,

Having recently completed the CNN course, my employer has partnered with AWS to facilitate a corporate AWS DeepRacer contest. Does anyone have any experience with the CNN architecture used in DeepRacer?

From what I understand thus far, the vehicle’s camera takes 8 images per second (4 mega pixels) and images are black and white (greyscale perhaps) not colour. Each image is processed to determine the state of the vehicle on the track, there are actions the vehicle can take (steering and speed adjustment), a policy component (?), and a reward function (with a range of parameters to utilise that relate to the state of the vehicle on the track).

Model training/evaluation is done in a virtual environment and the final model is loaded onto the actual vehicle for a physical race.

The specific CNN architecture is not divulged but the hyperparameters include: (i) gradient descent batch size, (ii) Number of epochs, (iii) learning rate, (iv) entropy, (v) discount factor, (vi) loss type, and (vii) number of experience episodes between each policy-updating iteration.

If anyone has any experience with the CNN used in DeepRacer then please let me know here.

Regards,
Chris

That sounds like a very interesting project, I’ll monitor this thread for any updates.

UPDATE… So I finally built a model that was submitted for race qualification and it came in 3rd fastest. The top ten models were entered into the physical race which occurred yesterday. I was unable to attend the event due to geographical location so someone raced my car for me. My car was runner-up based on time trials so a single vehicle on the track at any one time with the fastest lap of 3 taken as the performance metric.

The feedback from the driver was the vehicle rode really smoothly.

During training, I tried experimenting with different reward functions, varying hyperparameters and different training timeframes. The model that I submitted for the race was trained in 1 hour using a single factor in the reward function and using default hyperparameter settings. (N.B. each entrant was allocated 50 hours compute time on AWS DeepRacer server).

Given the only sensor on the vehicle is the camera, all computer vision serves as input to the model’s reward function. There are 9 parameters generated from the computer vision that you can work with.

In our race, there were no other vehicles on the track so object detection/avoidance was not required, just an ability to detect the markings of the track and the x, y coordinates, distance from track centreline, heading, steering angle, speed of the vehicle, progress around the track, etc.

I wanted to dig deeper into the CNN architecture but that didn’t seem possible. I did adjust the learning rate but that didn’t seem to improve things too much within a 3 hour training window. With more time and several hundred hours of compute for training then I think I could have improved performance no doubt.

While my model achieved a fastest lap of 11 secs, the previous champion trained a model that went around in just over 7 secs. He competes at AWS DeepRacer events around teh world and trains his models on his own compute platform at home. Allegedly he consumed ~30hours compute training this model. This person didn’t enter the race but participated just to demonstrate the performance that can be achieved by a pro - impressive stuff!

One thing that seemed to detract from my model’s performance in the virtual environment was imposition of speed and steering angle reward/penalty. I devised a “reward surface” which in practice was quite hopeless. Also the temptation to devise an all singing, all dancing reward function can be counter productive in my view. The best approach I believe is to test individual factors in isolation, retain the best and then progressively augment that model with additional value-adding factors. Again, 50 hours compute timeframe isn’t much so thinking about what to test and executing against that plan whilst being able to pivot based on your discoveries worked for me, although I didn’t actually complete all of my intended experiments.

Hope this is of interest and useful for any future DeepRacer participants within this community.

Quite interesting, thanks for the report.

Does anyone in these events use RNN’s (recurrent NN’s)? Since they’re intended to use history to predict the future, this seems well aligned with learning the sequence of actions required to learn a hot lap strategy (e.g. setting up the trajectory for the current turn based on what you know happens in the next few turns).

I’m not sure if RNNs have or are being used, but that would make sense to me and a potentially superior approach.