Deep Learning Specialization

week-2

Improving Deep Neural Networks: Hyperparameter tun

Andraw in the middle of the video from 07_rmsprop said

Assume w as a hirizental and b as a vertical for describing rmsprop algorithm . So i don’t actually get it why it makes sense ! It’s a parameter and there is not any similarity with horizental!

Can somone expalain it for me?

Hello, Prof Andrew is saying, that you can map the w and b parameters in an xy-plane, because he is also using a 2D diagram of how the RMSprop algorithm goes towards finding the optima!

I believe you may have misunderstood my question.

i dont undrastand why he said vertical is b and horizental is w in this image

Hi @mhaydari81 ,

Professor Andrew says:

It could be w1 and w2 where some of the center parameters was named as b and w for the sake of intuition.

In the context of neural networks and optimization algorithms (e.g. RMSprop), the parameters w and b typically represent weights and biases. However, in most cases, the parameter b is used as w_0, which serves as the bias term. When we use b as w_0, we typically set the corresponding input x to 1 to act as the bias.

In the context of RMSprop, the contour plot illustrates the optimization landscape based on the parameters being updated. Typically, only two parameters, such as w_1 and w_2 (or denoted as w and b in the video), are depicted in the plot. These parameters represent the dimensions along which the optimization algorithm is exploring.

Each point on the contour plot corresponds to a specific combination of w_1 and w_2, and the contour lines illustrate regions where the objective function (e.g., the loss function) has the same value.

Hope my explanation helps, feel free to ask if you have any questions!

It’s completely arbitrary, meaning it’s a choice he made. He could have flipped it. But this is all very unrealistic because we are confined to being able to draw pictures in 2 or 3 dimensions. In reality, we are typically dealing with (at a minimum) hundreds of dimensions and it’s not at all unusual for a DL model to have literally millions or even billions of parameters (e.g. any of the recent LLMs). We are limited and can only draw pictures in 2 or 3 dimensions, so he is just trying to give you some intuition for what is happening.

Think about what it means for a model to have literally only two parameters. That means it’s the equation of a straight line in the plane:

y = wx + b

That’s not very interesting compared to what NNs are doing.

Thank you for your time and consideration.