Can 't see how these are equal in the application of Bellman's eq'n in the *Learning the state-value function* lecture

My question:

  • Why is the max over all actions, a’ of Q (s’ ^(1)) = s’^(2) state in the next example ?

Thanks!

assuming he just meant s’ ^(1) in the first experience and s’ ^(2) in the 2nd experience …

Hello, @lkj,

Andrew was going through image at that time, so I think he meant to circle s'^{(1)} instead of s'^{(2)}. I will open a ticket for the course team to follow up on this.

Cheers,
Raymond

1 Like