Input of the ANN for the Learning the state-value function


The input is a vector that includes both the state and the action. Considering the action, it was noted that If action was the first action, we may encode it using 1, 0, 0,0 or if it was the second action to find the left cluster, we may encode it as 0, 1, 0, 0.
Wouldn’t it be better to encode the go left, go right, main engine actions as numbers between O and 1 rather than binary numbers? Because we have to control how much we use the engines right? So we give a ratio that describes how much we use these engines in a given state rather than a binary number

Hello @Bilel_Djemel,

Not all simulation environment support a continuous thrust value, but that’s an interesting idea!


The assignment used the default simulation environment that expects for binary thrust value. However, I did a quick look at the documentation and it seems the environment can actually be configured to have continuous thrust value.

In case you wanted to try it out, please make sure to make all necessary changes so that you have a neural network that can work in that continuous environment as required.

If you plan to modify the assignment notebook for this, then I strongly recommend you to finish and pass the assignment first.


Note that in real practice, it’s very common for maneuvering thrusters to be used in pulse-modulated modes, rather than in magnitude control modes.

It’s very complicated to make a thruster with a controllable throttle, due to the physics of combustion and efficiency.

It’s very easy to just turn the thruster on or off rapidly.

Yes, but I think it would be better to to use continuous values rather than binary ones. It is better to use for example the main engine at 36% rather than using it at max power. It is less energy consuming.

It depends on how you define “better”. There are lots of factors involved in designing a spacecraft.