How can we compute the 'optimal way' in a state action value function?

zhongli · December 4, 2023, 10:35am

In Week 3, the state-action value function part, the professor said how to ‘compute the optimal behave’ would be explained later. However, I still wandering how to get the ‘optimal behave’ before we use Bellman Equation to calculate Q, as Q will use maxQ(s’, a’) for a’.

Could someone help please? Thanks.

rmwkwok · December 4, 2023, 10:45am

Hi @zhongli,

You may share the video name and the time mark of the video for the part that you question, but the general idea is, we assume a random, initial Q (that can be very wrong), then we train the Q progressively until it becomes good, and then we can use it to predict the optimal behavior.

Cheers,
Raymond

Topic		Replies	Views
State Action value function [Coursera Video] Unsupervised Learning, Recommenders, Reinforcement week-3	4	484	July 31, 2024
I think there is an error or is my understanding off Unsupervised Learning, Recommenders, Reinforcement week-3	9	567	February 25, 2023
State-action Value Function - Video Unsupervised Learning, Recommenders, Reinforcement week-3	8	517	March 7, 2023
C3_W3 Quiz (State-action value function) Question 2 Unsupervised Learning, Recommenders, Reinforcement week-3	2	321	March 5, 2024
Optimal policy and the Q left, Q right values Unsupervised Learning, Recommenders, Reinforcement week-3	5	381	August 14, 2023

How can we compute the 'optimal way' in a state action value function?

Related topics