Course 2 week 2

Arjun_Bakshi · March 7, 2022, 5:45pm

RMSProp
When it comes to RMSprop and w and b are shown as two sets of paramenter that me be separated by showing that one moves towards the optima and b is the oscillations vertically, I was wondering how are we to choose these two sets?.
When we implement it would we not have a function that tells us which parameters would be in W and which in the b part as shown in the video.
Alpha Rate Decay.
As the iterations increase wont the dw and db also decrease? Like would’nt each update be less that the previous after a particular time?(especially when we are nearer to the optima) This was mentioned in the ML course by Andrew Ng.
If so, do we even need to decay alpha? Or are we doing the deacy due to the property of mini batch-grad descent that is to not converge?

Elemento · May 12, 2022, 8:46am

Hey @Arjun_Bakshi,
Apologies for the delayed response. Coming to your first query, I think that there is a slight confusion as to what W and b are. They simply represent the weights and biases for the current mini-batch respectively, for which we compute the gradients. The example in which Prof Andrew has shown weights along the horizontal axis and biases along the vertical axis is simply an example that he provided for intuition and nothing else.

As for your second query, the statement that “As the iterations increase, dw and db will highly likely decrease” is true. But when we are near to the optima, it is best to take small steps, and that is what Learning Rate Decay might help us to ensure. Also, as you have mentioned, when we use Mini-Batch Gradient Descent, we get influenced by the local batch optima, which might not be aligned with the entire batch optima, and in this case, Learning Rate Decay will help us to ensure that we don’t move further from the desired optima.

Note here the use of “might”, cause this is what has been used by Prof Andrew as well. " One of the things that might help speed up your learning algorithm is to slowly reduce your learning rate over time". It is just one of the strategies that may help in some cases, and may not help in other cases. You can only try and find. I hope this helps.

Regards,
Elemento

Topic		Replies	Views
Week 2 RMSprop intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	617	May 11, 2022
RMSprop in weight update - what if vertical slopes small and horizontal slopes large? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	608	September 19, 2021
W2 Assignment - How learning rate decay does such a great job Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	522	September 10, 2022
Adam Optimiztion Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	614	May 6, 2021
Week 4 - Assignment1 - Exercise 10 / Update parameters Neural Networks and Deep Learning coursera-platform	2	644	June 15, 2021

Course 2 week 2

Related topics