Is there a mistake in the "Learning the state-value function"?

Hi,

in the video Learning the state-value function, at 10:30, there is following definition of y:
.

On the left side it says:

y(1) = R (s(1) ) + gamma * max Q ( s’(1), a’ )
y(2) = R( s(2) ) + gamma * max Q ( s’(2), a’ )

Why the a’ doesn’t have an index?
I would expect:

y(1) = R (s(1) ) + gamma * max Q ( s’(1), a’(1) )
y(2) = R( s(2) ) + gamma * max Q ( s’(2), a’(2) )

Thanks

Hello @Goran_Hrzenjak,

No it is not a mistake. The a’ sign has to be read together with another a’:

image

Together the whole thing means choosing the a' that maximizes Q(s'^{(1)}, a'). At the time we compute y^{(1)}, we are only at the stage trying to determine what the best a' should be. Therefore, we use a' there to represent that it is any value of a that maximizes the Q.

Raymond

I understand. Thank you very much.

You are welcome, @Goran_Hrzenjak!