A small bug in the first week's test

{quiz solution removed by mentor as we can’t share it here}
Option D is also obviously right in the first question if the alpha is zero. So it’s better to include the alpha value when asking this question.

Hello @Nitish_Satya_Sai_Ged,

When we train a neural network, we need a positive learning rate to make it work. The gradient, however, can be positive, negative, or zero.


Hi Raymond,

Thanks for your reply. I understood. The alpha should be between 0 and 1, but it can’t be either 0 or 1.

The alpha can be any positive number and be larger than 1, but you need to verify which alpha value is actually beneficial to the training process. Setting it to zero won’t do us any updating, right? Even if the program does not complain for a zero learning rate (\alpha), we wouldn’t do it that way at all.

w := w - \alpha\frac{\partial{J}}{\partial{w}}

Hi Raymond,
But Andrew said it should range from 0 to 1 in the class. Of course, it shouldn’t be zero.

Can you share the source of that? Which time in which video? I think it is important to provide the source when quote.


Hello @Nitish_Satya_Sai_Ged!

Initially, it is recommended to choose alpha with a small value, between 0 and 1. Then increase or decrease it accordingly as @rmwkwok also mentioned that alpha can be greater than one. Prof. Andrew explains this in the video below.


Thank you @saifkhanengr!

@Nitish_Satya_Sai_Ged, if we look at the above slide from the video @saifkhanengr has kindly shared with us, we see that Andrew is only suggesting us to try those values in a manner of “3x” of the previous ones.

Under proper feature normalization (covered in Course 1 Week 2), and when dealing with fully connected neural networks (which is covered in this MLS), usually you can find a good learning rate in this range, but in case you can’t, you will still need to explore outside of this range.

If this is not the video you were watching, and you would like us to discuss another video, please just feel free to share it with us. It is very effective to clear up a question in where it began to show up.


Dear mentors,

I want to thank both of you. I was asked precisely about this particular side. And I apologize for not getting back to you sooner. Thank you so much for your support. I have one more follow-up question. As @saifkhanengr said, it could be greater than 1. When do we get such situations to make learning rates more significant than one? And one more question related to contour plots, I have some problems with my intuition. As Prof. Andrew said we would slide from the top of the mountain to valley points in such a way as to reach global minima. but when it comes to the contour plot, we are trying to find the minimum cost on the small circle which typically portrays the top of the mountain where we start sliding to find the global minimum, so this intuition contradicts my thinking, please help me to understand this a bit better.

Hello @Nitish_Satya_Sai_Ged,

I think the purpose of showing the mountain is just to convey the idea about contours. Can you see the similarity in the concept of contour between a mountain and a cost surface?

In the case of a cost surface, we want to get to the lowest point. In the case of a hiking in a mountain, we want to get to the highest point.

The idea of “contour” is the same. The target is not the same.

I don’t have an example for it now. If we want an example, we usually try it out ouselves with some datasets and some achitectures.


@rmwkwok , I understood. Thanks for your reply.