C2-W2-Lec 1

Subhan75 · February 12, 2025, 9:55pm

In this image when we fit the original J(w,b) using the regularized term, shouldn’t we be divivding the by -1/m(the total examples) rather than just -1/m(train) because we are already calculation the training error right ? Im a bit confused

TMosh · February 12, 2025, 10:18pm

We’re averaging the cost over only the members of the training set.

Subhan75 · February 12, 2025, 10:30pm

Is there any specific reason to do that ? If we average on all the examples of the dataset, what difference would that make ?

Thanks.

TMosh · February 12, 2025, 10:47pm

It would give a false indication of the training error.

rmwkwok · February 13, 2025, 1:45am

Hello, @Subhan75, to give you an intuitive example, let’s say 10 apples are distributed to market A and market B. Market A gets 4 of them and sells them at a total price of 20 dollars, and B gets the remaining 6 and sells them at 24 dollars.

Now, if we want to compare the prices of apples in these two markets, the reasonable formula will be

\frac{\text{total price of apples sold in market A}}{\text{total number of apples sold in market A}}

In this ratio, both the numerator and denominator account for only apples in market A. The same would go for market B and we get the averaged prices of 5 dollars and 4 dollars respectively in Market A and B, concluding B selling them cheaper.

The similar idea applies here before we can compare the averaged errors of dataset A (training) and B (test):

\frac{\text{total error of samples in training set}}{\text{total number of samples in training set}}

Cheers

7mramir · February 13, 2025, 5:55pm

When we write the cost function as an average over the training set (i.e., sum of individual losses divided by m_train), we also want the regularization term to be on the same “per-example” scale. Otherwise, if we left the regularization term unscaled, its relative importance would grow (or shrink) with the number of training examples.

Subhan75 · February 13, 2025, 9:05pm

Thanks, this makes sense.

7mramir · February 14, 2025, 7:22pm

You are welcome Subhan75

Topic		Replies	Views
Week3 cross validation Advanced Learning Algorithms week-module-3	14	101	June 14, 2025
Why does the regularization term in L2 Regularization include division by the number of examples (m)? Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	2	90	April 10, 2025
Cost Function formula with 1 training example Advanced Learning Algorithms week-module-2	1	532	January 26, 2023
Question About L2 Regularization Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	3	169	April 29, 2024
Doubt For Evaluating a Model Advanced Learning Algorithms week-module-3	1	394	July 21, 2023

C2-W2-Lec 1

Related topics