MLS : Regression and Classification : Cost function with regularization

@ai_is_cool,

Without computing the partial derivatives, it might not be immediately clear how the optimization process explicitly reduces w_3 and w_4. In the video you mentioned, Prof. Ng is relying on an intuitive understanding of how regularization works rather than deriving it mathematically at that moment. When you minimize the cost function J(\vec{w}, b) , i.e.

\min_{{\vec{w}, b}} \ \ {1 \over 2m} \sum_{i=1}^m(f_{\vec{w}, b}({\vec{x}^{(i)}}) -y^{(i)})^2 + 1000 \cdot w_3^2 + 1000 \cdot w_4^2,

the algorithm will prioritize keeping w_3 and w_4 small, because the coefficient 1000 in the regularization term is very large, meaning the penalty for large w_3 and w_4 is severe, even if it means sacrificing some accuracy on the training data. Next video Regularized linear regression provides more detailed explanation.

In machine learning, L_2 regularization is often referred to as weight decay because it explicitly reduces the magnitude of the weights during the optimization process.
You may also find the following thread helpful.