MLS : Regression and Classification : Cost function with regularization

conscell · February 20, 2025, 11:52pm

Without computing the partial derivatives, it might not be immediately clear how the optimization process explicitly reduces w_3 and w_4. In the video you mentioned, Prof. Ng is relying on an intuitive understanding of how regularization works rather than deriving it mathematically at that moment. When you minimize the cost function J(\vec{w}, b) , i.e.

\min_{{\vec{w}, b}} \ \ {1 \over 2m} \sum_{i=1}^m(f_{\vec{w}, b}({\vec{x}^{(i)}}) -y^{(i)})^2 + 1000 \cdot w_3^2 + 1000 \cdot w_4^2,

the algorithm will prioritize keeping w_3 and w_4 small, because the coefficient 1000 in the regularization term is very large, meaning the penalty for large w_3 and w_4 is severe, even if it means sacrificing some accuracy on the training data. Next video Regularized linear regression provides more detailed explanation.

In machine learning, L_2 regularization is often referred to as weight decay because it explicitly reduces the magnitude of the weights during the optimization process.
You may also find the following thread helpful.

Topic		Replies	Views
Will only low weights aim to zero with regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	515	May 9, 2023
Derivative of regularization term Supervised ML: Regression and Classification week-module-3	22	1690	November 6, 2024
Week # 3, Regularization Supervised ML: Regression and Classification week-module-3	2	280	February 16, 2024
Question on how Lambda works Supervised ML: Regression and Classification week-module-3	9	508	February 22, 2023
Why W will close to 0 when lambd? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	532	August 19, 2022

MLS : Regression and Classification : Cost function with regularization

Related topics