Algorithm refinement: Mini-batch and soft updates

I don’t understand how the parameters of Q became W and B when it was previously s and a. Can someone explain it?

Hello @yusufnzm,

I guess you are asking about around 8:45 of the entitled video. (Next time, please share the timestamp).

Actually, to more fully describe Q where it is a neural network, instead of calling it Q(s, a), we would better call it Q(s, a; W, B), or in English: function Q of s and a parametrized by W and B. It is because the neural network is parametrized by a set of W and B and without which we can’t compute anything. Sometimes we call it Q(s, a) because it is NOT a neural network (so it does not carry any W and B at all), or because we want to focus that given some fixed W and B, we want to compute Q in some s and a. However, in that part of the video, I think Andrew want to highlight the most critical part of doing a soft-updates which is basically on adjusting W and B based on their previous values. This is why W and B are the center of the discussion in that part.

Therefore, for completeness, we can call it Q(s, a; W, B), and s and a are always there, but I think our focus for that part of the video is on W and B so he highlighted them.

Cheers,
Raymond