Lunar lander training set generation

ljb1706 · November 11, 2023, 2:46pm

Hi,

I am guessing the training sets are computed based on a simulation of the module. So it is only as good as this simulation which might not model accurately the behavior the lander.
What is the usual practice with real-life cases? Would the training sets be based on actual experience in the field?
If not, wouldn’t this be a major limitation to this approach?

Thanks,

Laurent.

rmwkwok · November 11, 2023, 11:54pm

Hi @ljb1706,

I believe this is a reasonable tradeoff.

Simulation is safe, cheap, and efficient. You see these after the assignment because there it shows how many times it crashes before getting to land successfully.

Besides its limitation, we also can’t forget to improve the simulation as we find any insufficiency during a real-world run. Therefore, I would say, if we want the pros of the simulation environment, we try to address the cons associated with it. In other words, we want to leverage both simulation runs and real-world runs, and it is unlikely a pick-one-out-of-two question.

In some cases, autonomous driving for example, we can alleviate those safety and cost issues without simulation by bringing in skillful drivers. Tesla, for example, was reported to have done that. While I am sure we understand that such approach might not be applicable to lunar lander, I doubt whether Tesla can really live without any simulation - I am not just talking about during their very early development, but probably also as a source for “(wrong or catastrophic experience” which, I suppose, you can’t get as much from the real-world drivers at any time? However, I have to say that I don’t know about Tesla’s model training program.

I am sure this is a topic worth more researching.

Cheers,
Raymond

ljb1706 · November 12, 2023, 6:33am

Thanks for the detailed feedback.

TMosh · November 14, 2023, 8:23am

The original self-driving vehicles from MIT had some very interesting issues with training their NN.

When they trained only using good drivers, the model did not learn how to avoid hazards, because good drivers avoid those situations.

When they tried to include evasive maneuvers (like avoiding hitting a tree), the model learned to swerve toward trees, so that it could then evade them.

They had to invent a lot of techniques to control the training environment.

Topic		Replies	Views
How to approach real life problems with reinforced learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	2	472	February 18, 2023
Streamlined Robot Training: Robots trained in lo-fi simulation perform better in reality AI Discussions the-batch , ai-discussions	1	62	May 20, 2023
Lunar Lander - Reinforcement Learning - Q Function Algo Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	458	July 14, 2023
LiDAR data simulation using GANs for autonomous vehicles AI Discussions	0	57	January 30, 2023
C3_W3_A1_Assignment - Lunar Lander Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	556	June 13, 2023

Lunar lander training set generation

Related topics