How is learning rate parameter determined

in programming assignment in week 3 of course neural networks deep leaning with link below:
Week 3 2. Planar Data Classification with One Hidden Layer

I see the learning rate is set to 1.2. And I played around with smaller values (1.0,1.1 etc.) and bigger values ( 2, 5, 10 etc.), then I would see error. It is assertion errors of W1. My questions are:

  1. how is the learning rate value determined or what is the guideline choosing a proper learning rate value? I I see learning rate set to much smaller values like 0.05 before.

  2. in this specific example where learning rate set to 1.2, why errors come out for other values?

Thank you!
Paul

1 Like

Hello Peng Hu,

Welcome to the community!

Thank you for your question.

Learning rate (LR) is a critical factor for optimization. A large learning rate leads to faster convergence than the smaller one.

Here, LR has been set as per the requirement of the model, choosing other values will lead to errors.

1 Like

Hi, Paul.

It’s great that you tried some experiments with the learning rate. You always learn something interesting when you take the course materials and extend them like that. It turns out that there is no magic formula for determining the learning rate: as Rashmi says, it’s specific to the properties of the particular model you have chosen and the training data that you are using. So it is figured out by exactly the kind of experimentation you did: you try a range of different values and see which ones give you the best convergence. If you choose a rate that is too small, then convergence either stalls or just takes way too long. If you choose too large a value, then you can get oscillation or even divergence, meaning that the costs and accuracy values swing around or actually get worse rather than better with more iterations. The goal is to find the “Goldilocks” value that is “just right”. But sometimes the situation is even complicated enough that you need an adaptive strategy that starts with a high learning rate and than reduces it later to avoid overshooting. That is a more sophisticated strategy that will be discussed in DLS Course 2.

The learning rate is what Prof Ng calls a “hyperparameter”, meaning a value that you as the system designer have to choose either through experience or experimentation. Here in Course 1, there are too many new things to discuss and he doesn’t have time to go into any real detail about that sort of issue. Choosing and tuning hyperparameters will be a major topic in DLS Course 2 and Prof Ng will describe how to approach that type of choice in a systematic and efficient way, so please “hold that thought” and stay tuned for Course 2. :nerd_face:

Regards,
Paul

1 Like

Hi Rashmi & Paul,

thank you for your explaination which help me understand the parameter a bit more. As Paul mentioned, I hope I will get more insight into this topic at course 2.

Great to fininsh course 1! :firecracker: :firecracker: :firecracker:

Blockquote

You are most welcome Peng.

Happy Learning!