Algorithm refinement: Mini-batch and soft updates

yusufnzm · January 25, 2023, 5:57pm

I don’t understand how the parameters of Q became W and B when it was previously s and a. Can someone explain it?

rmwkwok · January 26, 2023, 4:31am

I guess you are asking about around 8:45 of the entitled video. (Next time, please share the timestamp).

Actually, to more fully describe Q where it is a neural network, instead of calling it Q(s, a), we would better call it Q(s, a; W, B), or in English: function Q of s and a parametrized by W and B. It is because the neural network is parametrized by a set of W and B and without which we can’t compute anything. Sometimes we call it Q(s, a) because it is NOT a neural network (so it does not carry any W and B at all), or because we want to focus that given some fixed W and B, we want to compute Q in some s and a. However, in that part of the video, I think Andrew want to highlight the most critical part of doing a soft-updates which is basically on adjusting W and B based on their previous values. This is why W and B are the center of the discussion in that part.

Therefore, for completeness, we can call it Q(s, a; W, B), and s and a are always there, but I think our focus for that part of the video is on W and B so he highlighted them.

Cheers,
Raymond

Topic		Replies	Views
Not so clear the concreate difference between soft update and normal update Unsupervised Learning, Recommenders, Reinforcement week-3	19	459	July 3, 2023
Don't understand why we use q_netword & target_q_network Unsupervised Learning, Recommenders, Reinforcement week-3	1	357	September 19, 2023
Week 4 - Assignment1 - Exercise 10 / Update parameters Neural Networks and Deep Learning	2	644	June 15, 2021
Course 2 week 2 Improving Deep Neural Networks: Hyperparameter tun	1	519	May 12, 2022
Soft update is the same as Adjusting Alpha? Unsupervised Learning, Recommenders, Reinforcement week-3	7	504	October 19, 2022

Algorithm refinement: Mini-batch and soft updates

Related topics