categorization error, training, regularized: 0.072, simple model, 0.062, complex model: 0.007

categorization error, cv, regularized: 0.066, simple model, 0.087, complex model: 0.113

The simple model is a bit better in the training set than the regularized model but it is worse in the cross-validation set.

shouldn’t regularized model have a lower training error than a simple model(equivalent to having extremely high lambda) in general?

1 Like

Hey @karra1729,

I guess we are clear on the fact as to why the simple model performs worse on the cv set as compared to the regularized model. Since, it is too simple a model, hence, it underfits the data, and thereby perform worse on the CV set.

Now, the question remains “**Does the regularized model performs always better on the training set as compared to the simple model?**”, and the answer is “**not always**”. This is because the simple model (*in the lab*) is a different model altogether (*one dense layer less*), and hence, you don’t know to what high value of \lambda does the simple model corresponds to? Perhaps, the simple model only corresponds to a value of \lambda = 0.05, which is still less than \lambda = 0.1 (*used in the regularized model in the lab*), and hence, it overfits the training data more.

The fact that a simple model can be thought of as a model corresponding to very high value of \lambda is definitely true, but firstly, you don’t know what this “high value” is, since \lambda is bounded on only one side, i.e., 0. On the other side, you can set it to high as you want. And secondly, since the simple model is altogether different, so, it may perform better on the training set in some cases.

However, if we keep the same structure for all the 3 models with different regularization values, can we get the same results as in the lab? This might be an interesting question! Give me some time, let me whip up a quick experiment to see if it can happen or not. Till then, let me know if this helps.

Cheers,

Elemento

1 Like

Hey @karra1729,

Please check out the Version 11 of this kernel. In this version, I have tried to use the same NN architecture with 3 different \lambda values, 0 for complex model, 0.1 for regularized model, and 1 for simple model. In this you can clearly see, that the regularized model performs better on the training set as compared to the simple model (*high value of lambda*). However, this is just a single experiment, you can run multiple experiments like this to validate this hypothesis. I hope this helps.

Cheers,

Elemento

1 Like

yeah,i thought exactly the same.bcoz its a diff architecture. but I just wanted to confirm. thanks for taking such effort to explain in great detaIL.

1 Like