What are examples of other return formula used in real-world applications?

I’ve completed the first section of reinforcement learning and in all the examples discussed (mars rover, helicopter, quadruped robot) the return is same. I’m curious when and why would you alter the return function?
This might be talked in a later video, apologies if it is, was quite curious and decided to ask as the question popped.

Hey @jashm,
An interesting question indeed, and unfortunately, Prof Andrew doesn’t discuss much along your line of thoughts, primarily due to time and depth constraints of the course itself. Every topic taught by Prof Andrew in all the 3 courses can be taught in an infinite greater depth, but there’s a stark trade-off between “what all content can be included” and “what’s the right choice of content for a beginner”.

That being said, in any RL application, we seek to maximize the expected return, where the return is defined as some specific function of the reward sequence. Now, you will find that for a majority of RL applications, defining return as the discounted sum of rewards works pretty well. However, it doesn’t mean that return can only be defined in this manner. For instance, check out this research work, in which the authors went on to describe how defining the return as the maximum of the reward sequence can prove to be beneficial for certain applications.

Feel free to skim through some of the other research works along these lines on Google Scholar. I hope this helps.