Hi Everyone,

I was watching videos about reinforcement learning. Specifically, I was watching this video " Learning the state-value function" where Andrew explains a basic algorithm that can be used to train lunar-lander. He mentioned that it MAY work and it still needs improvement, yet I can not wrap my head around it. My main question: how does the neural network in this algorithm know that it is getting closer to the right answer or farther from it? What helps it improve over time?

Hello @RealOmarKhalil

As explained in this video you are using Bellman equation which is giving maximizing action for that particular state, using this data only you are again training your neural network to get another state action value and as the main purpose of Bellman’s equation is providing maximizing action for a particular state and action you will finally land up to the correct state function value

Hi,

I was also having trouble here but I think the answer is that the Reward function R(s) used in the Bellman equation is deterministic i.e. you already know exactly how to calculate the reward. If the reward function is unclear, then you may need a neural network with training data to do supervised learning to discover that reward function.

I hope that makes sense … 3 months later