There is one question asked in quiz2. It states that you are provided 3 contour plots - one with gradient descent, one with gradient descent with momentum (beta = 0.5), one with gradient descent with momentum (beta = 0.9). How do we identify which plot corresponds to which beta? Can someone please give hints on this? Thanks in advance.

What momentum is doing is using an Exponentially Weighted Average of the gradients, to smooth out the update behavior. Prof Ng has a couple of lectures before the momentum lecture in which he explains the math of EWAs and their behavior. It’s probably worth watching the lecture Understanding Exponentially Weighted Averages again to see his explanation. But what he shows is that if you use a \beta of 0.9, then it means that you are averaging over roughly the last 10 samples. The closer \beta is to 1, the longer the period of the average is. E.g. if \beta = 0.98, then you’re averaging over the last 50 samples. So 0.5 is further from one than 0.9 is, meaning that the average is over a shorter time scale and thus the 0.5 will give a choppier or less smooth graph, but it will still be somewhat smoother than the graph with no momentum at all. Does it make more sense if you re-examine the three graphs with that set of ideas in mind?

Thanks a lot Sir for your reply. I was also thinking on similar lines but was not fully confident about it. This explanation cleared all my doubts.