Hello,
after seeing this video, I tried it with my own little project.
The dataset has around 3000 training examples with 29 features. This I split 80/20 into train and test set.
In this project I have a relatively deep MLP with 11 layers (10x 512 units with ReLU and 1x 1 linear output unit). I am just testing around with this and wanted to get just a big NN.
What I found is that without normalizing, my RMSE score on the train and the test set very fast converged. And no matter how big I made my NN I could not get my model to overfit. So my first question is, how come that my model could not on purpose overfit the training data even with increasing the size of my NN?
When I scaled my input features, the train RMSE score suddenly increased a lot but my train set still stayed the same.
In the image you can see all the mentioned scores.
My second question, what is going on here? How come scaling suddenly increased my train RMSE score (no other changes)?. According to the video normalizing will increase my model speed but as we can see, it was already converging so even if I let the model without scaling train for much longer it would not reach the “new” level of RMSE score with normalizing.
Apart from that (third question ) is it save to say that after normalizing, my model is clearly overfitting (which I should try to tackle with L2 or dropout)?
Thanks a lot