Regression with flattened statistics

u5470152 · January 30, 2023, 8:02pm

Good day,

recently I am experimenting with logistic regression after normalizing the data set. I understand normalization makes choosing the learning rate a lot easier as the gradient is not super large in some component and super small in others.

The normalization defined in this course is to divide the difference with the mean by the standard deviation in each column. Recently I accidentally compute the statistics of the flattened data set. That is, the mean and standard deviation across all entries in the data frame. To my surprise this normalization performed much better in either regularized and unregularized logistic regression.

Am I lucky or there are some study about that flattened statistics may work better? I am happy to provide the link to the experiment I wrote on kaggle.com when it is polished.

Thanks,
chi-yu

rmwkwok · January 30, 2023, 10:33pm

Hello chi-yu,

Did you mean to

take the mean and the standard deviation across all samples and all features,
then you will get one mean value and one standard deviation value,
then you subtract any feature value with that mean and then divided the difference by that standard deviation?

Can you provide a table of performance metrics for comparisons? It would be great to see how much better it is in what aspects.

Thanks,
Raymond

u5470152 · January 30, 2023, 11:54pm

Thank you for your reply. Your interpretations are right. I am still working on the codes and figures. Once it is ready I will update the link. Thank you for the quick response.

rmwkwok · January 30, 2023, 11:57pm

Okay! Look forward to your update!

u5470152 · February 1, 2023, 10:17pm

Hi there! I finally finished the write-up. In conclusion, I think it was a luck that flattened normalization produced much better predictions.

I did logistic regression on two features x_1, x_2. Not only that, I was trying to produce quadratic decision boundary. Instead of including linear terms, the polynomials I used were homogeneous like w_1x_1^2 + w_2x_1x_2 + w_3 x_2^2+b. In this case the flattened normalization has a much better prediction score (provided in the link). Nevertheless, when I added linear terms back, like w_1x_1^2+w_2x_1x_2+w_3x_2^2+w_4x_1+w_5x_2+b, the columnwise normalization has comparable score (and much faster). When I add linear terms also to the flattened normalization model, the performance is not improving. Therefore it is reasonable to conclude that flattened normalization is not necessarily better.

chi-yu

rmwkwok · February 2, 2023, 1:45am

Hello chi-yu @u5470152 ,

It is a very detailed notebook! I agree with you that it could just be a luck. Normalization helps make each feature sharing similar ranges so that we can use one learning rate for every feature. “Flattened” normalization won’t make them sharing similar ranges.

May I make a suggestion to your notebook? Would you consider to add a “summary” section at the beginning or at the end? The summary can have a table so that your readers can quickly compare major metrics among the models to see how they evolve. There can be a table that compares some key figures as well, such as how the boundary has evolved.

Cheers,
Raymond

u5470152 · February 2, 2023, 2:22am

Hi Raymond,

those are constructive suggestions. I struggled with labeling the figures of the output of matplotlib. I think in R markdown there is a way to cross reference output figures. Do you think there is a source that explains how to do it in Jupyter notebook? Or do I have to save the output figures and repost them in the table you suggested?

Thank you for your time and encouragements,
chi-yu

rmwkwok · February 2, 2023, 2:25am

Hey chi-yu @u5470152,

This is an interesting question! I have never tried it myself but this page looks like to do the job?

Raymond

u5470152 · February 2, 2023, 2:45am

Hi Raymond @rmwkwok,

that was the link I checked. I will have to dig deeper later. But I will definitely try to provide a summary in some way. The current version of the note I wrote can be tedious to read through all at once.

chi-yu

rmwkwok · February 2, 2023, 2:47am

Oh, I think I have missed a point. I think yes, you have to save the output figures. (at least this is the only way I know )

By the way, please let me know if you update it with a summary table, and I would definitely like to check it out!

Cheers,
Raymond

u5470152 · February 3, 2023, 7:14pm

I have updated the note with a summary! Enjoy (if not let me know). Thanks for the support along the way.

rmwkwok · February 4, 2023, 3:14am

Hey! I want to try to help make your article better! I found 2 things we can discuss:

Did you refer it to this or that? I think it is “that”? But the symbol of it implies “this”.

image794×248 59.7 KB
I think the hidden logic behind the highlighted sentence is:
- too many features → too many weights.
- insufficient data + too many weights → overfitting
- to counter overfitting → reduce # of weights → reduce # features
- Agree? If so, worths to expand the sentence a little bit? Your call!
  
  Screenshot_20230204_110619811×93 11.7 KB

Lastly, I stopped in the middle of reading it because I found that all images are not displayed…

I am not sure if that is just my problem, but I want to continue reading it later after your feedbacks. If you don’t have that problem and probably it’s just me, then I will continue anyway. Let me know.

Cheers,
Raymond

u5470152 · February 4, 2023, 4:21am

Hi Raymond,

you really looked carefully. You spotted my errors.

You are right the picture only displays on my browser. If I use incognito the pictures disappear… It might be because I linked the figures to my Google album, so I can see that in my browser.

I never figured out a good way to insert pictures on kaggle notebook. If you know a way please let me know.

There seems to be similar issues discussed here.

Best,
chi-yu

rmwkwok · February 4, 2023, 6:24am

Hello chi-yu,

I can see a lot of efforts in your work too!

Have you tried the methods by KSavleen?

I don’t publish notebook on kaggle, so I don’t know where the challenge is, but if the above don’t work, let me know and I can look into it later!

Cheers,
Raymond

u5470152 · February 4, 2023, 11:14pm

What about now? Your information was very useful. Make sure you are viewing version 8 or beyond.

I removed the confusing sentence you pointed out earlier, the part about the decision boundary. As for the overfitting part, at this point the note is mostly for myself, and I understood it as you elaborated (I do not know how many people actually read it like you do). So I am leaving it that way. But your elaboration was definitely on the point!

Your further comments are welcomed, as I am totally new in ML.

chi-yu

rmwkwok · February 5, 2023, 1:58am

No problem! I am here to just provide my perspective, and it is important that you make the call!

Here are something that I want to draw you attention.

It’s important that the features are normalized before sending to training. The mapped features are not yet normalized.

If you change that cell to the following, you would see something pretty different :

X_train_mapped = map_feature(X_train[['LotFrontage', 'YearBuilt']].to_numpy())
mu = X_train_mapped.mean(axis=0, keepdims=True)
std = X_train_mapped.std(axis=0, keepdims=True)

X_train_norm_mapped = normalize(X_train_mapped, mu, std)

Cell 26 built a lr_model, but the weights and bias learnt there were not used in cell 28 in prediction. Also, sklearn’s LogisticRegression enabled regularization by default, so you may want to disable it there?

Screenshot_20230205_091612518×710 49.8 KB
If you think it is worthwhile to reinvestigate after seeing the difference made by my point number 1, then one possible addition is a column for “learning rate” in your summary table. An analysis of any relation between learning rate and feature normalization can largely enrich your notebook.

Cheers,
Raymond

u5470152 · February 5, 2023, 2:22am

Hi Raymond

I thought about your point one day one lol, but I did not want to deal with writing up the transformations that draws the original decision boundary. I figure it is much easier to normalize first and do the mapping. If I do mapping first and normalize, I do not see a straightforward way to plot the decision boundary. On second thought, I might be able to do modify the contour coordinates. Instead of (u^2, uv, v^2, u,v), I can probably do ((u^2-A)/B, (uv-C)/D, (v^2-E)/F, (u-G)/H, (v - I)/J), where A,\ldots J are the relevant means and stds.

About your point two, I always wished the course tells us more about how to use scikit learn packages. Maybe they dive deeper later? Thanks for letting me know. I will look into scikit learn logistic regressor soon.

Finally, I agree with your point on learning rate. I can add them later also.

chiyu

rmwkwok · February 5, 2023, 3:19am

Hello chi-yu,

This is my idea:

u = np.linspace(X_train['LotFrontage'].min(), X_train['LotFrontage'].max(), 100)
v = np.linspace(X_train['YearBuilt'].min(), X_train['YearBuilt'].max(), 100)
u, v = np.meshgrid(u, v) #This gives you all combinations of u and v. this replaces the two for-loops when you generate z
uv = np.vstack([u.flatten(), v.flatten()]).T

#Now, we can treat uv as if it is a X_test, and then do map_feature and normalize
uv_mapped = map_feature(uv)
uv_norm_mapped = normalize(uv_mapped, mu, std)
z = (np.dot(uv_norm_mapped, w) + b).reshape((100, 100))

fig, ax = plt.subplots(1,3, sharey = True, figsize = (15,4))
ax[0].scatter(X_train.loc[train_rows_below_med, 'LotFrontage'],
           X_train.loc[train_rows_below_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'r',
           marker = '^')
ax[1].scatter(X_train.loc[train_rows_above_med, 'LotFrontage'],
           X_train.loc[train_rows_above_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'b')
ax[2].scatter(X_train.loc[train_rows_below_med, 'LotFrontage'],
           X_train.loc[train_rows_below_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'r',
           marker = '^')
ax[2].scatter(X_train.loc[train_rows_above_med, 'LotFrontage'],
           X_train.loc[train_rows_above_med,'YearBuilt'],
           facecolors = 'none',
           edgecolors = 'b')
ax[0].contour(u,v,z, levels = [0], colors="g")
ax[1].contour(u,v,z, levels = [0], colors="g")
ax[2].contour(u,v,z, levels = [0], colors="g")
ax[0].set_ylabel("Scaled YearBuilt")
ax[1].set_xlabel("Scaled LotFrontage")
fig.suptitle("Columnwise Normalization Decision Boundary Against Training set \
Using Only Homogeneous Quadratic Form",
             fontsize = 14)
plt.show()

If you change the X_train_norm_mapped to like my previous post, then you can use the code here to do the plot. This code is mostly based on yours.

Cheers,
Raymond

PS1: it looks more handsome, doesn’t it?

PS2:
They are indeed not scaled YearBuilt and LotFrontage. The axes’ labels need to be changed.

u5470152 · February 5, 2023, 3:52am

Thank you for your participation. I will take a look a in the next few days!

rmwkwok · February 5, 2023, 3:53am

You are welcome, chi-yu

Topic		Replies	Views
Map_feature function before or after normalization, plus how is it related to 'RBF' Supervised ML: Regression and Classification week-module-3	3	495	March 9, 2023
Trying out Model selection Advanced Learning Algorithms week-module-3	17	613	February 19, 2023
Some fun graphs derived from the Week 2 Programming Assignment Neural Networks and Deep Learning week-module-2 , coursera-platform	43	122	February 21, 2025
Can someone help explain this line Supervised ML: Regression and Classification week-module-2	8	447	July 27, 2023
Collaborative Filtering - problem with implementation on raw dataset Unsupervised Learning, Recommenders, Reinforcement week-module-2	38	382	June 16, 2024

Regression with flattened statistics

Related topics