How is learning rate parameter determined

Peng_Hu · December 12, 2023, 8:44am

in programming assignment in week 3 of course neural networks deep leaning with link below:
Week 3 2. Planar Data Classification with One Hidden Layer

I see the learning rate is set to 1.2. And I played around with smaller values (1.0,1.1 etc.) and bigger values ( 2, 5, 10 etc.), then I would see error. It is assertion errors of W1. My questions are:

how is the learning rate value determined or what is the guideline choosing a proper learning rate value? I I see learning rate set to much smaller values like 0.05 before.
in this specific example where learning rate set to 1.2, why errors come out for other values?

Thank you!
Paul

Rashmi · December 12, 2023, 9:17am

Hello Peng Hu,

Welcome to the community!

Thank you for your question.

Learning rate (LR) is a critical factor for optimization. A large learning rate leads to faster convergence than the smaller one.

Here, LR has been set as per the requirement of the model, choosing other values will lead to errors.

paulinpaloalto · December 12, 2023, 4:16pm

Hi, Paul.

It’s great that you tried some experiments with the learning rate. You always learn something interesting when you take the course materials and extend them like that. It turns out that there is no magic formula for determining the learning rate: as Rashmi says, it’s specific to the properties of the particular model you have chosen and the training data that you are using. So it is figured out by exactly the kind of experimentation you did: you try a range of different values and see which ones give you the best convergence. If you choose a rate that is too small, then convergence either stalls or just takes way too long. If you choose too large a value, then you can get oscillation or even divergence, meaning that the costs and accuracy values swing around or actually get worse rather than better with more iterations. The goal is to find the “Goldilocks” value that is “just right”. But sometimes the situation is even complicated enough that you need an adaptive strategy that starts with a high learning rate and than reduces it later to avoid overshooting. That is a more sophisticated strategy that will be discussed in DLS Course 2.

The learning rate is what Prof Ng calls a “hyperparameter”, meaning a value that you as the system designer have to choose either through experience or experimentation. Here in Course 1, there are too many new things to discuss and he doesn’t have time to go into any real detail about that sort of issue. Choosing and tuning hyperparameters will be a major topic in DLS Course 2 and Prof Ng will describe how to approach that type of choice in a systematic and efficient way, so please “hold that thought” and stay tuned for Course 2.

Regards,
Paul

Peng_Hu · December 13, 2023, 7:44am

Hi Rashmi & Paul,

thank you for your explaination which help me understand the parameter a bit more. As Paul mentioned, I hope I will get more insight into this topic at course 2.

Great to fininsh course 1!

Blockquote

Rashmi · December 13, 2023, 11:50am

You are most welcome Peng.

Happy Learning!

Topic		Replies	Views
Questions on Exercise 7&8 of Week3 Programming Assignment Neural Networks and Deep Learning week-3	5	232	April 6, 2024
DLS Course 1, week 4, Deep NN Application, second exercise wrong Neural Networks and Deep Learning	6	570	April 11, 2022
Parameters Diverging When Learning Rate is too Large Supervised ML: Regression and Classification week-2	1	408	June 6, 2023
Week3 Course1 Programming Assignment exercise 8- nn_model Neural Networks and Deep Learning	2	590	December 25, 2021
Question regarding learning rate graph from W2 logistic regression lab Neural Networks and Deep Learning	3	630	July 28, 2023

How is learning rate parameter determined

Related topics