Regression with flattened statistics

Good day,

recently I am experimenting with logistic regression after normalizing the data set. I understand normalization makes choosing the learning rate a lot easier as the gradient is not super large in some component and super small in others.

The normalization defined in this course is to divide the difference with the mean by the standard deviation in each column. Recently I accidentally compute the statistics of the flattened data set. That is, the mean and standard deviation across all entries in the data frame. To my surprise this normalization performed much better in either regularized and unregularized logistic regression.

Am I lucky or there are some study about that flattened statistics may work better? I am happy to provide the link to the experiment I wrote on kaggle.com when it is polished.

Thanks,
chi-yu

Hello chi-yu,

Did you mean to

  1. take the mean and the standard deviation across all samples and all features,

  2. then you will get one mean value and one standard deviation value,

  3. then you subtract any feature value with that mean and then divided the difference by that standard deviation?

Can you provide a table of performance metrics for comparisons? It would be great to see how much better it is in what aspects.

Thanks,
Raymond

Thank you for your reply. Your interpretations are right. I am still working on the codes and figures. Once it is ready I will update the link. Thank you for the quick response.

Okay! Look forward to your update!

Hi there! I finally finished the write-up. In conclusion, I think it was a luck that flattened normalization produced much better predictions.

I did logistic regression on two features x_1, x_2. Not only that, I was trying to produce quadratic decision boundary. Instead of including linear terms, the polynomials I used were homogeneous like w_1x_1^2 + w_2x_1x_2 + w_3 x_2^2+b. In this case the flattened normalization has a much better prediction score (provided in the link). Nevertheless, when I added linear terms back, like w_1x_1^2+w_2x_1x_2+w_3x_2^2+w_4x_1+w_5x_2+b, the columnwise normalization has comparable score (and much faster). When I add linear terms also to the flattened normalization model, the performance is not improving. Therefore it is reasonable to conclude that flattened normalization is not necessarily better.

chi-yu

Hello chi-yu @u5470152 ,

It is a very detailed notebook! I agree with you that it could just be a luck. Normalization helps make each feature sharing similar ranges so that we can use one learning rate for every feature. “Flattened” normalization won’t make them sharing similar ranges.

May I make a suggestion to your notebook? Would you consider to add a “summary” section at the beginning or at the end? The summary can have a table so that your readers can quickly compare major metrics among the models to see how they evolve. There can be a table that compares some key figures as well, such as how the boundary has evolved.

Cheers,
Raymond

Hi Raymond,

those are constructive suggestions. I struggled with labeling the figures of the output of matplotlib. I think in R markdown there is a way to cross reference output figures. Do you think there is a source that explains how to do it in Jupyter notebook? Or do I have to save the output figures and repost them in the table you suggested?

Thank you for your time and encouragements,
chi-yu

Hey chi-yu @u5470152,

This is an interesting question! I have never tried it myself but this page looks like to do the job?

Raymond

1 Like

Hi Raymond @rmwkwok,

that was the link I checked. I will have to dig deeper later. But I will definitely try to provide a summary in some way. The current version of the note I wrote can be tedious to read through all at once.

chi-yu

Oh, I think I have missed a point. I think yes, you have to save the output figures. (at least this is the only way I know :stuck_out_tongue_winking_eye:)

By the way, please let me know if you update it with a summary table, and I would definitely like to check it out!

Cheers,
Raymond

I have updated the note with a summary! Enjoy (if not let me know). Thanks for the support along the way.

Hey! I want to try to help make your article better! I found 2 things we can discuss:

  1. Did you refer it to this or that? I think it is “that”? But the symbol of it implies “this”.

  2. I think the hidden logic behind the highlighted sentence is:

    • too many features → too many weights.
    • insufficient data + too many weights → overfitting
    • to counter overfitting → reduce # of weights → reduce # features
    • Agree? If so, worths to expand the sentence a little bit? Your call!

Lastly, I stopped in the middle of reading it because I found that all images are not displayed…


I am not sure if that is just my problem, but I want to continue reading it later after your feedbacks. If you don’t have that problem and probably it’s just me, then I will continue anyway. Let me know.

Cheers,
Raymond

Hi Raymond,

you really looked carefully. You spotted my errors.

You are right the picture only displays on my browser. If I use incognito the pictures disappear… It might be because I linked the figures to my Google album, so I can see that in my browser.

I never figured out a good way to insert pictures on kaggle notebook. If you know a way please let me know.

There seems to be similar issues discussed here.

Best,
chi-yu

Hello chi-yu,

I can see a lot of efforts in your work too!

Have you tried the methods by KSavleen?

I don’t publish notebook on kaggle, so I don’t know where the challenge is, but if the above don’t work, let me know and I can look into it later!

Cheers,
Raymond

What about now? Your information was very useful. Make sure you are viewing version 8 or beyond.

I removed the confusing sentence you pointed out earlier, the part about the decision boundary. As for the overfitting part, at this point the note is mostly for myself, and I understood it as you elaborated (I do not know how many people actually read it like you do). So I am leaving it that way. But your elaboration was definitely on the point!

Your further comments are welcomed, as I am totally new in ML.

chi-yu

No problem! I am here to just provide my perspective, and it is important that you make the call!

Here are something that I want to draw you attention.

  1. It’s important that the features are normalized before sending to training. The mapped features are not yet normalized.
    image
    If you change that cell to the following, you would see something pretty different :exploding_head::
X_train_mapped = map_feature(X_train[['LotFrontage', 'YearBuilt']].to_numpy())
mu = X_train_mapped.mean(axis=0, keepdims=True)
std = X_train_mapped.std(axis=0, keepdims=True)

X_train_norm_mapped = normalize(X_train_mapped, mu, std)
  1. Cell 26 built a lr_model, but the weights and bias learnt there were not used in cell 28 in prediction. Also, sklearn’s LogisticRegression enabled regularization by default, so you may want to disable it there?

  2. If you think it is worthwhile to reinvestigate after seeing the difference made by my point number 1, then one possible addition is a column for “learning rate” in your summary table. An analysis of any relation between learning rate and feature normalization can largely enrich your notebook.

Cheers,
Raymond

Hi Raymond

I thought about your point one day one lol, but I did not want to deal with writing up the transformations that draws the original decision boundary. I figure it is much easier to normalize first and do the mapping. If I do mapping first and normalize, I do not see a straightforward way to plot the decision boundary. On second thought, I might be able to do modify the contour coordinates. Instead of (u^2, uv, v^2, u,v), I can probably do ((u^2-A)/B, (uv-C)/D, (v^2-E)/F, (u-G)/H, (v - I)/J), where A,\ldots J are the relevant means and stds.

About your point two, I always wished the course tells us more about how to use scikit learn packages. Maybe they dive deeper later? Thanks for letting me know. I will look into scikit learn logistic regressor soon.

Finally, I agree with your point on learning rate. I can add them later also.

chiyu

Hello chi-yu,

This is my idea:

u = np.linspace(X_train['LotFrontage'].min(), X_train['LotFrontage'].max(), 100)
v = np.linspace(X_train['YearBuilt'].min(), X_train['YearBuilt'].max(), 100)
u, v = np.meshgrid(u, v) #This gives you all combinations of u and v. this replaces the two for-loops when you generate z
uv = np.vstack([u.flatten(), v.flatten()]).T

#Now, we can treat uv as if it is a X_test, and then do map_feature and normalize
uv_mapped = map_feature(uv)
uv_norm_mapped = normalize(uv_mapped, mu, std)
z = (np.dot(uv_norm_mapped, w) + b).reshape((100, 100))

fig, ax = plt.subplots(1,3, sharey = True, figsize = (15,4))
ax[0].scatter(X_train.loc[train_rows_below_med, 'LotFrontage'],
           X_train.loc[train_rows_below_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'r',
           marker = '^')
ax[1].scatter(X_train.loc[train_rows_above_med, 'LotFrontage'],
           X_train.loc[train_rows_above_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'b')
ax[2].scatter(X_train.loc[train_rows_below_med, 'LotFrontage'],
           X_train.loc[train_rows_below_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'r',
           marker = '^')
ax[2].scatter(X_train.loc[train_rows_above_med, 'LotFrontage'],
           X_train.loc[train_rows_above_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'b')
ax[0].contour(u,v,z, levels = [0], colors="g")
ax[1].contour(u,v,z, levels = [0], colors="g")
ax[2].contour(u,v,z, levels = [0], colors="g")
ax[0].set_ylabel("Scaled YearBuilt")
ax[1].set_xlabel("Scaled LotFrontage")
fig.suptitle("Columnwise Normalization Decision Boundary Against Training set \
Using Only Homogeneous Quadratic Form",
             fontsize = 14)
plt.show()

If you change the X_train_norm_mapped to like my previous post, then you can use the code here to do the plot. This code is mostly based on yours.

Cheers,
Raymond

PS1: it looks more handsome, doesn’t it?

PS2:
They are indeed not scaled YearBuilt and LotFrontage. The axes’ labels need to be changed.

Thank you for your participation. I will take a look a in the next few days!

You are welcome, chi-yu :slight_smile: