Question on discounting

In reference to the video on The Return in reinforcement learning, if the rover starts at state 4 and then moves left to state 1, shouldn’t the return be 0 + (.9)*0 + ((.9)^2)*100 = 81, since the steps are 4 to 3, 3 to 2, 2 to 1?

Hi Richard,

The reward at the current state is missing from your calculation.

Raymond

Hi Raymond,

But that’s my question. Maybe it’s a theoretical issue, but why should there be a reward for where you start? Aren’t you being rewarded for actions?

Richard

Hello Richard, in our example of Mars Rover, although you won’t get more reward without making a move, rewards are bounded to states not actions, ain’t them?

You can design a system that a reward is judged on the change of states due to the action, or/and on the action itself. Your design of the system should facilitate the robot to learn how to accomplish its goal well.

If you want to learn the Return function as a general formula, then taking into account the reward at the current state seems, in general, to be the most inclusive one, right?

This is perhaps less important if your focus is on how to use Return to make future decision, but if we disregard reward at the current state for our Mars Rover, then we will not be able to calculate the total reward because the starting reward is always neglected.

Hi Raymond,

OK, thanks.

By the way, I just finished the Specialization, with an overall grade of a bit over 99% (and I only retook one quiz).

What’s the next logical step?

Richard

Congratulations, Richard!!

Given that you have a bigger goal in mind which I don’t know what it is, do a small project to find out what’re still missing from your skillset and experience?

Good idea, Raymond. Thanks.

How far beyond MLS does the Deep Learning Specialization go?

Richard

Hello Richard, you will find DLS Course 1 Week 2 pretty familiar and Week 3 begins the transition to something different and from Week 4, you will be fully into Deep Neural Network which is not covered in the MLS. :wink:

Raymond

Hello Raymond, thanks!

Richard