Overfitting- Week3 other dataset

I am having trouble understanding overfitting while running this code provided at the end of lab 3

# parameters_2 = nn_model(X_2, Y_2, n_h=1, num_iterations=3000, learning_rate=1.2, print_cost=False)

# parameters_2 = nn_model(X_2, Y_2, n_h=2, num_iterations=3000, learning_rate=1.2, print_cost=False)

parameters_2 = nn_model(X_2, Y_2, n_h=15, num_iterations=3000, learning_rate=1.2, print_cost=False)

All the three plots below seem to create a good boundary and also for the original model so how can you actually determine overfitting?

Here are the three plots with the different layer sizes:



So my conclusion is a little different than yours: I’d say that n_h = 1 is not a very good solution (“underfitting”), but that the real point here is that there isn’t any substantive difference between the results for n_h = 2 and n_h = 15.

But training and executing the model with n_h = 15 is more expensive, so that effort is wasted. It costs you more memory and cpu, but buys you nothing in terms of the prediction accuracy of the model. So I guess you could call the n_h = 15 case “overfitting”. At least when you apply the model only to the specific data on which it was trained, which brings us to the higher level point here: this is not really a very good example of these concepts in any case.

If you go ahead after this and take one of the specializations like DLS where you actually learn about different types of models and how to choose what model is appropriate for a given task, you’ll see that what they are doing here is very particular and limiting: if they start with a set of data, they train the model only on that data and then predict only on that specific data that the model was trained on. That’s fundamentally not how this really works, right? What you do is gather your training data, which you believe covers (as best you can given your cost and other constraints) the full range of the types of data you want your model to be able to handle well. Then you train the model on that specific set of training data. Then you apply the model to other data that the model was not trained on. That’s the case where it becomes a lot more clear what the meaning of “overfitting” and “underfitting” is.

My suggestion would be just to move on with this course, but “hold that thought” and then take DLS next. In DLS C1 and C2, you’ll get a much more complete presentation of these topics and concepts.

1 Like

okay thank you. It was a really good and clear explanation!