How RL tackles this situation?

Arjun_Reddy · January 23, 2023, 8:55pm

Lets say there are 6 states like shown in the figure:

So, let’s say we have trained RL model with data (X,y) in which the agent is initialized only at 1,2,3,4,6 states and let’s say we have collected data (X,y) from agent when initialized in only these states.

What does agent do when initialized at 5(Where we haven’t collected data for this state)?
what obstacles does agent face?
How does RL model learn to tackle this obstacles or challenges?

AbdElRhaman_Fakhry · January 23, 2023, 9:29pm

Hi @Arjun_Reddy

In the training phase we already set all possible ways and all rewards, punishment of these ways so if the RL model initialized at 5 Q(5,left) = his rewards value will decrease, and the model try to go right the rewards values will increase , so the the RL model will learn the way(left) in the conditions X isn’t the good ways(step) it’s better to go right… and so on in the training phase

Cheers,
Abdelrahman

rmwkwok · January 25, 2023, 11:35am

Hello @Arjun_Reddy,

In addition to Abdelrahman’s answer, first, the model should still be able to suggest an action, but since the model has never been trained with such situation, there are at least 2 possibilities in terms of the quality of that action:

The other examples on which the model has been trained is sufficient for the model to well “intrapolate” the missing piece about starting from state 5;
The model didn’t learn well to predict for starting from state 5.

If it is an online RL model, that example (starting from state 5) is also queued for training, and your model can be improved from there.

Cheers,
Raymond

rmwkwok · February 1, 2023, 2:58am

Moved this thread to MLS Course 3 Week 3.

Topic		Replies	Views
States, actions, rewards Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	457	August 8, 2023
How does the Q-Learning Algorithm actually learn? Unsupervised Learning, Recommenders, Reinforcement week-module-3	18	555	December 5, 2023
Some doubt about gathering data for reinforcement learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	495	March 10, 2023
Deep Reinforcement Learning Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	499	January 2, 2023
Why map X -> Y and use supervised learning when making an example of RL Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	324	October 2, 2023

How RL tackles this situation?

Related topics