Why are we normalizing BOTH features by the mean in section 1. Shouldn’t eatch feature be normalized to it’s own mean and standard deviation for better results?
adv_norm = (adv - np.mean(adv))/np.std(adv)
Yes. And that is exactly how it works. Check the numpy definitions of how those functions operate.
I downloaded the Jupyter Notebook as a Python file into Pycharm.
What the workbook is doing is taking the mean of the WHOLE dataframe. One number, a float, then subtracting BOTH columns by that mean. Since the TV Numbers are an order of magnitude larger than the sales number, the mean is close to the mean for the TV Numbers. And the standard deviation is close to STD Deviation for the TV Numbers.
Proper technique is to take a mean for the tv numbers, subtract that mean from the column ; divide by the std dev for JUST the Tv numbers , and do likewise for the sales number. Get the mean of JUST that column, subtract, and scale by std dev of JUST that column.
Run a scikit-learn normalize and you’ll see that it’s different.
Thanks for the details. My reply should have said “that is exactly how it should work”.
I just got access to this course’s materials, so I’ll take a look and file a support ticket as necessary.
I did a little testing, and it appears that’s not actually what’s happening.
The trick is that np.mean(adv) and np.std(adv) are both vectors, not scalars.
So the normalization is applied in vector form only to the appropriate columns of the data frame.
So I think it’s all OK.