Why don't we include regularization term in the training dataset?

syedaskarimuslim · December 6, 2023, 2:03pm

Refer Week 3 lecture on evaluating the model (check the snapshot below).

I am confused regarding why we don’t include the regularization term in the training dataset?

gent.spah · December 6, 2023, 2:18pm

You mean in the testing phase there is no regularization!

Regularization helps to fit the model to the data by suppressing weights, during testing you are not fitting anything, you just want to check if what you came up with during training is good enough in predicting unseen data. Regularization is only used to train the model!

syedaskarimuslim · December 6, 2023, 2:27pm

No. I mean why aren’t we using it in training dataset. I get it why we wouldn’t use it in testing dataset but the lecture says not to include it in training dataset also (check the last line of lecture snapshot I attached. There is no regularization term in the cost function equation of the training dataset).

rmwkwok · December 6, 2023, 2:43pm

Hello @syedaskarimuslim,

Because at evaluation, we only care about how well the predictions are, regardless we are evaluating with a training set or a testing set.

Cheers,
Raymond

syedaskarimuslim · December 6, 2023, 4:36pm

Hello @rmwkwok
Regularization is an antidote to overfitting. I can understanding why we wouldn’t be concerned about regularization during testing, however, I can’t understand why wouldn’t we use this antidote in training to make sure our predictions are not coming out of an over-fitted curve.

TMosh · December 6, 2023, 6:52pm

Regularization is used during training to control overfitting. For this you use the regularized cost.

Once you have fit the model, now you just want to measure how well it works.

For this you use the unregularized cost. This is because now you do not want to include the additional penalties based on the weight values.

rmwkwok · December 6, 2023, 9:25pm

Hello @syedaskarimuslim,

As Tom explained, we need to be aware that there are two stages - a training stage and an evaluation stage. Your question focused on the former while the slide on the latter - at least this is what I got from watching the lecture, but if the lecture said anything which made you think otherwise, please share the exact time mark in the video and I will watch it again.

Cheers,
Raymond

syedaskarimuslim · December 7, 2023, 4:51pm

Refer Course-2, Week-3, Video titled “Evaluating a model”, timestamp: 5:35".

rmwkwok · December 8, 2023, 2:09am

Thanks, @syedaskarimuslim. I have watched between 5:35 to 6:35 again.

In the training stage, exactly when we are applying gradient descent, we use the cost function that includes the regularization term on the training set.

In the evaluation stage when we are not thinking about gradient descent at all, we use the cost function without the regularization on both the training and the test set.

DIsagree with or unclear about anything that I said above?

Raymond

Christian_Simonis · December 8, 2023, 6:59am

Hi there,

in addition to the excellent replys of my fellow mentors:

The training data set is just pure data. When training the model (=fitting parameters), we optimize:

the model fit to the training data so that a good performance is reached on training data
and steer the complexity of the model with regularization

Afterwards we are done with training and there is nothing more to regularise at this point.

Then we just test how good the training was with respect to reality and new data which the model did not see before. Therefore, we provide the model with a unseen test set. Now we can evaluate how well the model performs on this new test set.

So simplified we can say:

if the model performs clearly worse compared to the performance on the training data, this indicates overfitting and this means that model complexity was potentially too high (which we were steering in the training with regularization) given the available data
if performance of the model on the new test set is comparable with the performance of the model on the training data and this suits your business requirements, this is a good sign, that regularization was effective and you could prevent overfitting, by keeping the model complexity in a state, where the model can generalise well (and does not overestimate noise too much)

Here more info, which also touches upon the validation data set: https://community.deeplearning.ai/t/regular-math-s-vs-ml/250791/2

Hope that helps!

Best regards
Christian

AmMoPy · December 12, 2023, 2:24pm

I think the confusion is to when exactly we use Regularization; let’s consider this from Ordinary Least Square Regression (OLS) closed form solution point of view.

In order to calculate weights (coefficients) in OLS we can use Normal Equation as follows w = XTX_inv.dot(X.T).dot(y), where:
X: training matrix of Features
y: training vector of Target values
XTX: X.T.dot(X), (a.k.a Gram Matrix, where T represents Matrix Transpose)
XTX_inv: Inverse of Gram Matrix

We can now use these coefficients to make predictions using the testing set: X_test.dot(w)

Now let’s Introduce regularization, everything from above applies except Gram Matrix is now:
XTX: X.T.dot(X) + alpha * np.eye(XTX.shape[0]), where alpha controls strength of regularization (0 = OLS)

So it’s the fitting stage (either using Closed form or GD), after which coefficients are being calculated, is where regularization takes place. Once we have final weights them we can proceed with predictions using testing set.

Topic		Replies	Views
Doubt For Evaluating a Model Advanced Learning Algorithms week-module-3	1	388	July 21, 2023
Why regularization term is not included when calculating error Advanced Learning Algorithms week-module-3	1	513	April 25, 2023
Week3 cross validation Advanced Learning Algorithms week-module-3	14	71	June 14, 2025
Questions of C2W3_Lab_01_Model_Evaluation_and_Selection" with sklearn Advanced Learning Algorithms week-module-3	3	204	March 11, 2024
Regularization on test time Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	818	October 7, 2022

Why don't we include regularization term in the training dataset?

Related topics