RMSprop why b is y axis and w is x axis

ansonchantf · February 27, 2024, 9:48pm

Hi,

I understand that we need to smooth the oscillation but I am confused why b is y axis and w is on x axis. I am also not sure why DW needs to be small and DB needs to be large as described in the lecture.

TMosh · February 27, 2024, 10:17pm

w and b are on the two axes because it’s a plot of cost vs w and b.

ansonchantf · February 28, 2024, 10:23pm

Thank you, could you please explain a bit more? still confused

TMosh · February 29, 2024, 12:25am

I guess I do not understand your question fully.

paulinpaloalto · February 29, 2024, 12:26am

If you examine the formulas, note that W and b are both being handled in the same way. There is no significance to which one is depicted as the x or y axis in the picture. And the bigger point here is that it’s essentially impossible to realistically show what is going on in just 2 or 3 dimensions. Prof Ng is just doing the best he can to give some geometric intuition by drawing the picture in 2 dimensions. The dimensions of real networks are typically in the thousands or even higher. The claim is that GPT-4 has 1 trillion parameters. How do you draw a graph in 1000 dimensional space, let alone trillion dimensional space?

rmwkwok · February 29, 2024, 1:09am

Hello Anson @ansonchantf,

Let me expand a bit more on Paul’s answer, if we look for any reason that supports the asymmetry between W and b in this one lecture, we won’t be able to, because everything written down there, except the graph, were only showing the symmetric side of W and b.

As Paul pointed out, the purpose of the graph is to show the effect, which I think you already understood because you said so in the first post. In other words, for the purpose of this lecture, there is no need to show any underlying reason that supports the assymetric behavior displayed in the graph.

However, if we have to imagine some reasons, for curiosity let’s say, that can support such asymmetry, then, perhaps, it may be a problem of 1 feature and 1 label being modeled by just one output layer, and that the range of the feature is larger than the range of the label, so that the cost surface looks like a wide ellipse. And because the learning rate is too large for the b-dimension (but not large for w-dimension), so it oscillates faster in the b-dimension. However, whatever cause I wrote in this paragraph was just a guess, and the cause (not the effect) for this graph is not the key for explaining RMSProp

Cheers,
Raymond

ansonchantf · February 29, 2024, 3:08am

Thank you all!

Topic		Replies	Views
DLS Course 2 week 2 RMSprop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	569	October 7, 2022
H and w and horizental and vertical Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	6	127	May 1, 2024
RMSprop in weight update - what if vertical slopes small and horizontal slopes large? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	622	September 19, 2021
Course 2 week 2 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	519	May 12, 2022
Week 2 RMSprop intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	620	May 11, 2022

RMSprop why b is y axis and w is x axis

Related topics