C2W3 Lab Qn - Model Evaluation and Selection

Hi

  1. In the first part of Linear regression, the y was not scaled. If we scale that also, the MSE lowers from
    training MSE (using sklearn function): 406.19374192533155
    training MSE (for-loop implementation): 406.19374192533155
    To
    training MSE (using sklearn function): 0.020931674858814992
    training MSE (for-loop implementation): 0.020931674858814985

Is scaling advisable only for features?

Feature normalization is only used on the features. Not on the labels.

The purpose of normalizing the features is so that gradient descent works better. It’s not to lower the absolute cost.

We really don’t care about the absolute cost - just finding the minimum of the cost curve. The scaling of the cost doesn’t matter.

1 Like

I see. So does that mean had we scaled Label, we still would have arrived at the same conclusion and picked the 4th order polynomial?

I hope I can ask my remaining questions on the pynbk here itself.

poly = PolynomialFeatures(degree=2, include_bias=False)
Why is bias turned off?

On the second section of neural network, when I compare polynomial degree, here are the results. Wouldn’t you say putting degree would help the neural network?

degree =1
RESULTS:
Model 1: Training MSE: 406.19, CV MSE: 551.78
Model 2: Training MSE: 73.40, CV MSE: 112.09
Model 3: Training MSE: 73.40, CV MSE: 111.34

degree =4
Model 1: Training MSE: 50.77, CV MSE: 78.37
Model 2: Training MSE: 49.60, CV MSE: 74.63
Model 3: Training MSE: 9702.85, CV MSE: 10231.34

On neural Network, When I keep on rerunning from the Neural Network Section with degree =1 , now I see all models with same training MSE. Scratching my head. On my first try, I saw the first Model1 had MSE in order of 406.

RESULTS:
Model 1: Training MSE: 73.62, CV MSE: 107.61
Model 2: Training MSE: 73.36, CV MSE: 111.91
Model 3: Training MSE: 73.40, CV MSE: 112.29

Hope some one would respond. Or should I post these in separate threads?
Thanks

I just refreshed my access to that course, I’ll take a look at the expected results.

Did you change anything in the lab? Because here are the results I get:
image

In Neural Section, there was a suggestion to increase the degree of polynomial from 1 to 4 and the expected conclusion was no benefit in Cost. However when I tried it, I noticed the difference posted with degree =1 and degree =4 .
I just tried it and here are the results:

Degree = 1
Model 1: Training MSE: 75.39, CV MSE: 98.93
Model 2: Training MSE: 73.40, CV MSE: 112.30
Model 3: Training MSE: 76.42, CV MSE: 120.22

Degree = 4
RESULTS:
Model 1: Training MSE: 50.71, CV MSE: 78.40
Model 2: Training MSE: 46.63, CV MSE: 72.14
Model 3: Training MSE: 44.37, CV MSE: 82.38

BTW In the same post, there is another question of why include_bias is set to False.

It would really help if you posted more specific information about what part of the lab your question applies to.

Regarding the NN part of this lab:

Here are the details about the three NN models that are used.

Since model 2 and 3 give just about as good results as Model 1, it shows that adding these additional Dense layers isn’t helpful for this set of data.

The other lesson taught in this assignment is specifically that NN’s don’t benefit from you creating additional polynomial terms. This is because the non-linear activations in the hidden layers are already making the model more complex - you don’t need to add polynomial terms at all.

It’s discussed in this text in the notebook.

Regarding why the lab uses include_bias = False.

Since the dataset is normalized, it has a mean value of zero. That means the bias will be zero, so we don’t need a bias term.

I am referring to this puny, In the Prepare the Data Section of Neural Network, where it sets degree =1.
You also explained that change the degree is moot as NN learns it automatically. However my test with degree =1 and degree =4 shows lower Cost with degree =4 for NN.

Prepare the Data

You will use the same training, cross validation, and test sets you generated in the previous section. From earlier lectures in this course, you may have known that neural networks can learn non-linear relationships so you can opt to skip adding polynomial features. The code is still included below in case you want to try later and see what effect it will have on your results. The default degree is set to 1 to indicate that it will just use x_train, x_cv, and x_test as is (i.e. without any additional polynomial features).

The difference in the numbers you posted aren’t really significant. There may be some small influence from the degree.

1 Like