Optimization using gradient descent least squares with multiple observations

Aniket_Bankar · February 2, 2024, 4:02am

Can anyone explain me why we take average of cost function in the advertising and sales problem of linear regression in the video “optimization using gradient descent least squares with multiple observations” , wouldnt the cost function just be sum of all costs

TMosh · February 2, 2024, 4:43am

Can you provide a time mark within that video?

Monty · February 2, 2024, 4:46am

Taking the average allows the cost function to be more interpretable and helps in making the gradient descent update step size consistent regardless of the data size. It ensures that the learning learning rate which is a key parameter in gradient descent has a similar effect on the cost function, regardless of the size of the dataset. This makes it easier to set and fine-tune hyperparameters during the training process.

Aniket_Bankar · February 2, 2024, 10:16am

4:00

Topic		Replies	Views
Why take cost average in Gradient Descent? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	538	April 28, 2022
Course2_week2_assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	618	June 28, 2021
Course 2 Week 3: compute cost solution is wrong? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	544	November 3, 2022
C2-W1:Optimization using gradients - Analytical method Calculus for Machine Learning and Data Science week-module-1	2	283	January 12, 2024
Error in illustration in video "Optimization using Gradient Descent - Least squares with multiple observations" Calculus for Machine Learning and Data Science week-module-2	4	376	November 6, 2023

Optimization using gradient descent least squares with multiple observations

Related topics