Why is the regularization term in the cost function w_j^2 instead of just w_j?

ggrant005 · February 8, 2024, 7:10pm

Link to video HERE at about 4:00.

There was no clear explanation of where the w_j^2 regularization term came from in the cost function. Why is it w_j^2 instead of just w_j?

TMosh · February 9, 2024, 12:49am

Just like how we compute the cost from the squares of the errors (so that all errors become a positive value, and larger errors have a bigger impact), squaring the weights has the same reasoning.

The goal is to have an additional penalty for having large weight values. This helps the algorithm learn smaller weights, thereby reducing overfitting.

The “sum of the squares” method has some nice computational properties when we compute the gradients (i.e. partial derivatives).

ggrant005 · February 9, 2024, 3:32pm

Thanks for the explanation! As I understand it, using a squared weight is more of a choice for its positive characteristics than a term derived mathematically? I’m still wrapping my head around how much of the cost function is mathematically derived vs. found to produce positive results

TMosh · February 9, 2024, 4:08pm

Yes.

Cost functions are designed to suit a specific goal. The linear regression cost is based on the method of least squares. The logistic regression cost is based on maximizing the penalty for making wrong binary predictions (classifications).

Topic		Replies	Views
Logistic Regression Derivative of J(w,b) Supervised ML: Regression and Classification week-3	12	1040	May 16, 2023
Why was cost function for Logistic reg 1/m and not 1/2m? Supervised ML: Regression and Classification week-3	5	35	September 23, 2024
Cost Function Squared Error Machine Learning Specialization	1	95	July 19, 2022
What's the usage of J(w,b) for logistic regression? Supervised ML: Regression and Classification week-3	18	687	June 9, 2024
Mathematical proof for the cost function Supervised ML: Regression and Classification week-1	3	697	June 21, 2022

Why is the regularization term in the cost function w_j^2 instead of just w_j?

Related topics