Gradient descent with momentum

user335 · March 8, 2022, 9:27pm

HI
in the video of gradient descent with momentum, prof Andrew suggested to use this method in order to overcome the zig-zag movement of the steps. but didn’t we solve this problem when we talked about normalization? we said that normalization can center the data and hence the cost function contour plot will look like circular rather than ellipse as a result we avoid the zig-zag movement and the algorithm get faster. what benefit will G.D with momentum will add if i have already centered the data using normalization?

paulinpaloalto · March 8, 2022, 11:56pm

You can still have zig-zag problems even after normalization. The 3D pictures they show of the cost surfaces are pretty unrealistic. We are dealing with very high dimensional spaces here (as many dimensions as there are parameters, right?), so the surfaces can be pretty gnarly even with normalization and are completely impossible to visualize. Our human brains are just not set up to cope with more than 3 dimensions in terms of visualizing things. Here’s a paper from Yann Lecun’s group about visualizing cost surfaces.

Just to elaborate a bit: notice that we are plotting the cost function J which is a function of all the W^{[l]} and b^{[l]} values. J is a scalar output, of course. So when Prof Ng shows a plot in 3D, the z axis there is cost (J) and the input parameters are the x and y axes, which means that picture is showing you how it would work with literally 2 parameters. That’s right: just two. So you’ve got one layer with w and b. So while the pictures may help with the intuition, they are showing a radically simpler case than what we are actually dealing with.

user335 · March 9, 2022, 10:29am

deep learning
thank you sir

ctl · August 15, 2022, 8:37am

Hi, are momentum and its variations ever useful when the full set of examples is used?

Topic		Replies	Views
Checking Intuition: Gradient Descent with Momentum Advantage Improving Deep Neural Networks: Hyperparameter tun	1	555	October 5, 2022
Why do we need momentum when data is normalized during preprocessing in ML or DL? Improving Deep Neural Networks: Hyperparameter tun week-2	4	29	November 26, 2024
Why normalization helps Improving Deep Neural Networks: Hyperparameter tun	2	543	July 20, 2023
Week 1: Question about video titled, "Why Normalize?" Improving Deep Neural Networks: Hyperparameter tun	1	501	January 3, 2022
Momentum doubt Improving Deep Neural Networks: Hyperparameter tun week-2	2	241	January 16, 2024

Gradient descent with momentum

Related topics