States, actions, rewards

Basira_Daqiq · August 8, 2023, 7:11pm

In reinforcement learning are the sets of actions, rewards, and states always pre-defined? I assume that the rewards are set and pre-defined, and the algorithm can not change/ add to those, and that the algorithm can learn new states depending on what action it has taken. How about new actions? Can it learn new actions?

AbdElRhaman_Fakhry · August 8, 2023, 8:47pm

In the reinforcement learning there are not new action as you give the model all possible parameters like let’s assume that if you want to make model in arm you give the model the direction, speed and the angle of this direction that’s the possible actions for this arm and by training the models you adjust these variables to make a good actions

Basira_Daqiq · August 8, 2023, 9:03pm

Thank you!
Would you be able to answer all the questions please, and confirm my assumption?

Jamal022 · August 8, 2023, 9:35pm

Hey @Basira_Daqiq Thanks for your post.

In RL the sets of actions, rewards, and states are not always pre-defined. The flexibility and adaptability of RL come from the fact that the algorithm can learn and interact with its environment to discover new states, actions, and even learn from the feedback (rewards) it receives.

Just consider that learning or degree to which RL can learn new states, actions, and rewards depends on various factors, including the complexity of the environment, the algorithm’s design, the amount of exploration, and the reward structure.

There might be predefined elements like the initial set of actions or some baseline rewards, RL algorithms can indeed learn new actions, states, and even adapt rewards over time

Best
Jamal

Ahmsok_hbu · August 8, 2023, 10:18pm

I’ll give this a try, since I also struggled with this myself.

In order for the model to know what to do in any given state, it needs to know the Q(s,a). The important thing to note is that, initially you don’t know what the Q(s,a) values are for a given state. This is achieved through training the model (or you can start by just guessing the values, but in order to improve, you have to train it).

States are variables, that are derived from the system, that are measurable/calculable values. Things like: position, speed, temperature, pressure, status, etc…

Actions can come from one of two places, depending on if you are: training, or testing, the model. If you are training the model, for the case of the flying helicopter, the actions are inputs from a human ,controlling a joystick. If you are testing the model, the actions come from policies, that are derived from model training.

The rewards are predefined. You state constraints on the system: If you get here, you get these points, if you go there, you get those points, if you crash, you get these points, if your fuel rate is over this number, you get these points…

The total Return, is the total of the series of actions and rewards that are taken from the beginning, to the end of an “episode”. Where an episode is defined as the start of the task, to it’s end. Where the end could be defined as either a crash, going too far out of bounds, or a successful execution, or some other criteria you can define.

So if you think about training a deep learning re-inforcement algorithm, every time you train the model, you are creating a policy. A policy is a prescription, that says that if you are in one particular state, you should go to some next state.

The goal is to train the model repeatedly, thereby creating multiple policies, each varying in total return. Everytime you train the model with a new policy (or think of it as a state-path), by using the NN training apprach, you are finding the aggregation of policies that produce the greatest total return.

Before you even start training, you can randomly create a policy, just to initialize the NN. But everytime you train the model through an excersize, it will aggregate the new policy with the previous ones through transfer learning.

This is my thought process about it. I hope this helps, or at least provokes some discussion.

Topic		Replies	Views
How to train RL model, if the reward for given state is changing dynamically? Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	473	February 4, 2023
How RL tackles this situation? Unsupervised Learning, Recommenders, Reinforcement week-module-3	3	480	February 1, 2023
Input to DQN in reinforcement learning Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	485	May 16, 2023
Some doubt about gathering data for reinforcement learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	495	March 10, 2023
How does the Q-Learning Algorithm actually learn? Unsupervised Learning, Recommenders, Reinforcement week-module-3	18	555	December 5, 2023

States, actions, rewards

Related topics