What is the difference between "State action value function" and "Bellman Equation"?

Goran_Hrzenjak · February 7, 2023, 8:45am

Hi,

I’m trying to clean up my understanding concerning the:

state value function
Bellman equation
Markov decision process

Could you please help me with the difference between the State Value Function

and the Bellman Equation

Well, I see the Bellman Equation gives you the Return value (a number) for start in the state s, take action just once, behave optimally after that by recursively using the Q(s,a) which is the State Value Function.

I guess that is not right, but again, what is the the formal difference between the State Value Function and the Bellman Equation?

Thanks

Goran_Hrzenjak · February 7, 2023, 10:23am

In the learning algorithm: mentioned in the Learning the state-value function lection:

Is it right to say, that the part Set Q = Q_new, is the iteration part of the Bellman Equation in the Markov Decision Process?

rmwkwok · February 7, 2023, 9:31pm

What do you mean by “iteration part of the bellman equation”? What is a iteration part? Can you highlight the iteration part on the slide that you shared earlier?

Goran_Hrzenjak · February 12, 2023, 11:23am

Hm. Here are my thoughts:

On the left side there is a Q(s,a), on the right a max of (q,a).
In order to calculate the max of (q,a), the Q(s,a) should be used.
Probably I’m missing something.

rmwkwok · February 13, 2023, 3:44am

Hello @Goran_Hrzenjak,

Thanks for clarifying. Let’s start with a quote of the Video “Bellman equation”:

If you can compute the state action value function Q of S, A, then it gives you a way to pick a good action from every scene. Just pick the action A, that gives you the largest value of Q of S,A. The question is, how do you compute these values Q of S,A? In reinforcement learning, there’s a key equation called the Bellman equation that will help us to compute the state action value function.

Therefore, the bellman equation is a formulation of how you can compute the State Action Value Function. Consider the State Action Value Function as a bigger thing, that you can compute it in whatever way you want, and then somebody comes up and tell you “hey, why don’t you compute it using the Bellman equation?”. What confused you is perhaps when the lecture first talked about the State Action Value Function, it already discussed it in a way exactly like the Bellman equation. However, they are two concepts (as concluded by the above quote) though they can share a lot of similarity in actual computations.

This should have addressed the “difference between them” as asked in the title of this thread.

As for “Markov Decision Process”, I think your question is not concise enough. If you want to connect Reinforcement Learning to Markov Decision Process, you only need to say that, our State Action Value Function takes in only the current state as the input to determine the next action, without considering any other previous states.

Cheers,
Raymond

Goran_Hrzenjak · February 20, 2023, 10:44am

Hello @rmwkwok,

thank you very much for the clarification!

Let me think loud.

This is the “the narrative”, of the State-action value function, the definition of Q(s,a):

The Q(s,a) returns a single number, (the return), for which the below “story” is truth:

The first part:

start in state s.

take (first) action a once ← A. You have to decide what is the action to take. No problem, just take the action which gives you the highest return?.

The second part:

after you’ve taken the action once, you behave optimally after ← B. You have to decide what is optimal (B1: for all consecutive steps?)

Question1. How do you decide what action a you take in the beginning? Isn’t it that already for A you should take the action which gives you the highest return. Why don’t you behave optimal already for the first step, not after the first step.

Question2 about max Q (a’, s’) over a’. I can imagine the algorithm that calculates the maximum return when all the returns are known for sure. But this is not the case at the beginning, something must set the values for all returns for all possible actions of all states. (Adds to the question1: It says max over a’, not max over a.) OK, I know they are set randomly at the beginning. The process of updating the values leading the action toward highest reward is recursive.

I will stop thinking loud now. I see I’ll have to think it over quiet and take deeper look in the code. Maybe you have some advice .

I completed all three courses of this Machine Learning Specialization and earned the certificate, so my subscription is stopped per end of February. Do you know, will I still have access to this conversation and be able to discuss here, after my subscription is canceled?

Cheers,
Goran

rmwkwok · February 20, 2023, 12:14pm

@Goran_Hrzenjak

Because we want to determin Q(s, a), and the a here is a variable. Q(s, a) means “hey, if I am at state s and I take an action a, what is the Q?”, or “hey if I am at this state, and I turn left, what would be my Q? what if I turn right, is that Q larger?” So, we can ask about Q(s, a = left) and Q(s, a=right). In Q(s, a), s and a are variables, right?

Why not? In the slide, we know the reward at each state, and they are 100, 0, 0, 0, 0, and 40 respectively. Why is it not known to us?

That is not a bad idea.

Yes! We can discuss here after your subscription ends.

Cheers,
Raymond

Topic		Replies	Views
Can 't see how these are equal in the application of Bellman's eq'n in the Learning the state-value function lecture Unsupervised Learning, Recommenders, Reinforcement week-module-3	2	18	February 8, 2025
How can we compute the 'optimal way' in a state action value function? Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	272	December 4, 2023
State action value function for terminal states Unsupervised Learning, Recommenders, Reinforcement week-module-3	9	425	September 21, 2024
Unsupervised Learning: Bellman Equation example looks incorrect Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	85	September 22, 2024
Inconsistent definition for the Bellman equations Unsupervised Learning, Recommenders, Reinforcement week-module-3	10	732	August 18, 2022

What is the difference between "State action value function" and "Bellman Equation"?

Related topics