In the first part of Linear regression, the y was not scaled. If we scale that also, the MSE lowers from
training MSE (using sklearn function): 406.19374192533155
training MSE (for-loop implementation): 406.19374192533155
To
training MSE (using sklearn function): 0.020931674858814992
training MSE (for-loop implementation): 0.020931674858814985

I see. So does that mean had we scaled Label, we still would have arrived at the same conclusion and picked the 4th order polynomial?

I hope I can ask my remaining questions on the pynbk here itself.

poly = PolynomialFeatures(degree=2, include_bias=False)
Why is bias turned off?

On the second section of neural network, when I compare polynomial degree, here are the results. Wouldnâ€™t you say putting degree would help the neural network?

degree =1
RESULTS:
Model 1: Training MSE: 406.19, CV MSE: 551.78
Model 2: Training MSE: 73.40, CV MSE: 112.09
Model 3: Training MSE: 73.40, CV MSE: 111.34

degree =4
Model 1: Training MSE: 50.77, CV MSE: 78.37
Model 2: Training MSE: 49.60, CV MSE: 74.63
Model 3: Training MSE: 9702.85, CV MSE: 10231.34

On neural Network, When I keep on rerunning from the Neural Network Section with degree =1 , now I see all models with same training MSE. Scratching my head. On my first try, I saw the first Model1 had MSE in order of 406.

RESULTS:
Model 1: Training MSE: 73.62, CV MSE: 107.61
Model 2: Training MSE: 73.36, CV MSE: 111.91
Model 3: Training MSE: 73.40, CV MSE: 112.29

In Neural Section, there was a suggestion to increase the degree of polynomial from 1 to 4 and the expected conclusion was no benefit in Cost. However when I tried it, I noticed the difference posted with degree =1 and degree =4 .
I just tried it and here are the results:

Degree = 1
Model 1: Training MSE: 75.39, CV MSE: 98.93
Model 2: Training MSE: 73.40, CV MSE: 112.30
Model 3: Training MSE: 76.42, CV MSE: 120.22

Degree = 4
RESULTS:
Model 1: Training MSE: 50.71, CV MSE: 78.40
Model 2: Training MSE: 46.63, CV MSE: 72.14
Model 3: Training MSE: 44.37, CV MSE: 82.38

BTW In the same post, there is another question of why include_bias is set to False.

Since model 2 and 3 give just about as good results as Model 1, it shows that adding these additional Dense layers isnâ€™t helpful for this set of data.

The other lesson taught in this assignment is specifically that NNâ€™s donâ€™t benefit from you creating additional polynomial terms. This is because the non-linear activations in the hidden layers are already making the model more complex - you donâ€™t need to add polynomial terms at all.

I am referring to this puny, In the Prepare the Data Section of Neural Network, where it sets degree =1.
You also explained that change the degree is moot as NN learns it automatically. However my test with degree =1 and degree =4 shows lower Cost with degree =4 for NN.

Prepare the Data

You will use the same training, cross validation, and test sets you generated in the previous section. From earlier lectures in this course, you may have known that neural networks can learn non-linear relationships so you can opt to skip adding polynomial features. The code is still included below in case you want to try later and see what effect it will have on your results. The default degree is set to 1 to indicate that it will just use x_train, x_cv, and x_test as is (i.e. without any additional polynomial features).