Let’s take an example, Reward for a state is changing dynamically in an environment, in that case how to train a RL model? Like for an example, with respective time the reward for a state is changing like for instance at one point of time the reward is +20 for state - 1 and on the same state - 1 at another point of time the reward is becoming -10. In this case how to train a model, where the reward is changing for a given state dynamical?
If the variation is based on time, then time should be one of the features for the Q network.
If the variation is based on specific states, then the number of states must be increased to cover those conditions.
If you can’t quantify what the reward is for every state, then you will have difficulty using reinforcement learning.