DLS, Course 2, Week 1, Question to normalizing Inputs

Hello,

after seeing this video, I tried it with my own little project.
The dataset has around 3000 training examples with 29 features. This I split 80/20 into train and test set.

In this project I have a relatively deep MLP with 11 layers (10x 512 units with ReLU and 1x 1 linear output unit). I am just testing around with this and wanted to get just a big NN.

What I found is that without normalizing, my RMSE score on the train and the test set very fast converged. And no matter how big I made my NN I could not get my model to overfit. So my first question is, how come that my model could not on purpose overfit the training data even with increasing the size of my NN?

When I scaled my input features, the train RMSE score suddenly increased a lot but my train set still stayed the same.
In the image you can see all the mentioned scores.

My second question, what is going on here? How come scaling suddenly increased my train RMSE score (no other changes)?. According to the video normalizing will increase my model speed but as we can see, it was already converging so even if I let the model without scaling train for much longer it would not reach the “new” level of RMSE score with normalizing.

Apart from that (third question :slight_smile: ) is it save to say that after normalizing, my model is clearly overfitting (which I should try to tackle with L2 or dropout)?

Thanks a lot :slight_smile:

I think:

On the first question your model is just doing poorly on all sets.

On the second question when you scaled features the model is training good but validating poorly (overfitting).

For the third I think you are right.

My question is exactly why on first point my model is doing poorly when the only difference is normalizing and according to the video it should only increase speed.

And even if the model is doing poorly, I was under the assumption that a big enough NN would eventually always be able to overfit.

Normalizing also helps convergence thats for sure.

Regarding your model those are my personal thoughts.

Yes it helps convergence. But it looks like my model without normalizing already converged so I do not really understand that. Faster convergence (with normalizing) does not mean it converges to a better value or does it? Just that it converges faster to the same end value (or?)

Normalizing also helps convergence thats for sure.

Regarding your model those are my personal thoughts.

@ABHINAV_KAUSHIK_RA20 This is just a quote of someone else’s answer?

11 hidden layers is quite a lot to debug,
What do you get if you try using one or two hidden layers?