Generating Ideal Plot in C1_W3_Lab08_Overfitting_Soln

Dear Mentor
I want to ask how can we achieve the ideal fit curve in the classifciation task.
I tried adding more data to the plot of both types, so then it fit the plot.
But generating data is not in the hands of an Engineer, but developing plot is what we can do actually.
Can you please guide me how can we achieve a plot that fits the exact “ideal plot”?
Really thankful.

Often, you can’t get an exact fit. But that isn’t the goal.
The goal is to create a model that gives “good enough” predictions for new examples.

Getting a perfect fit on the training set is usually a bad idea, because it means you have over-fit the training set and may not get good performance on new data.

Hi @Ghulam_Mustafa,

here some view from a regression perspective:

Basically you can also describe this quite similarity for classification problems, see also this paper:

https://www.researchgate.net/publication/331452094/figure/fig2/AS:731693849784321@1551460816294/The-concept-of-overfitting-and-model-regularization.png
(Source).

If you see the tendency of overfitting in your training process resp. evaluation of the model performance, you can think about to tackle this with:

  • increasing regularization (and punishing complexity of your model more)
  • Train a smaller model (less parameters)
  • apply or increase your dropout rate to bring some randomness into your training process which should increase the robustness of your model

Please let me know if this helps!

Best regards
Christian

Hello @Ghulam_Mustafa,

Although it is our job to collect sufficient data, generating artifical data to change the fitted curve in a way we want, as you suggested, isn’t one of our jobs. The data has to be real so that it has some power to model the underlying process that you are trying to study.

Engineering features by hand, however, is our job which can make it fit better. Christian has shared a topic about engineering polynomial features but the other part of that topic is about avoiding overfitting. Overfitting is our enemy because although it is a situation that the model fits very well to the training data, but fails to predict new data. As Tom has said, good prediction on new data should be our goal instead.

What do you think? How are you “creating” data now? And how are you evaluating your goodness of fit now? Are you engineering new features to improve your model performance? Would you like to share with us those “plots” and tell us what they are?

Cheers,
Raymond

Thanks Raymond
Actually i wanted to fit my curve using the equations to the ideal curve.
What i couldnt realized is how can it get fit without generating more data which actually is not in our hand.
Would like to know any stretegy to generate a curve which fits the ideal curve without generating more data.
Regards

Hello @Ghulam_Mustafa,

If your definition for ideal is “to train a model on training data and predict 100% correctly for validation data”, then as Tom pointed out earlier, you can’t get it, because usually it is the nature of data to have random noise, and random noise is unpredictable.

So we can’t further the discussion if you want an ideally fitted curve. We can discuss how to get a “better” curve though. Would you clarify your goal?

Cheers,
Raymond

Thanks Raymond for making it clear.
It seems like based on our calculations, we cant produce a curve which fits the ideal curve in Calssification task, while we can produce so in the regression task of the same code.
Really thankful to you.

Hello @Ghulam_Mustafa,

I have reservation with your comments. Are you talking about some real world data, or are you talking about some ideal data?

It would be best if you can share some plots that show what you are seeing and what you are expecting to see.

Raymond