Week 2 Lab - Confusing/Misleading kbins strategy

lystme · May 1, 2025, 7:16am

Hello,
In the lab, we are asked to create bins for the customers given their antiquity, with the bins represented by :
bin_edges = [0, 1, 3, 5, float(‘inf’)]

However, the binning strategy is ‘uniform’ which is a binning strategy where bins are of equal width:
kbins = KBinsDiscretizer(n_bins=len(bin_edges) - 1, encode=‘onehot-dense’, strategy=‘uniform’, subsample=None)

The labels are then defined using the original bin edges:
bin_labels = [‘0-1 years’, ‘1-3 years’, ‘3-5 years’, ‘5+ years’]

In the end, the data is confusing and possibly misleading. Is there a solution to this?

Georgios · May 1, 2025, 12:12pm

Hello @lystme,

You are correct, it seems that pd.cut with right=False (include left bin_edge)would be a better choise than KBinsDiscretizer with predefined bin_edges instead of custom ones and strategy=uniform. However I would get the same one-hot encoding dataframe for both strategies if I use pandas cut with right=True and if the data are uniformely distributed. Hope it helps

Topic		Replies	Views
C4W2 Lab2 4.2.5 Issue with Binning loyalty_program_years Using Custom Edges in KBinsDiscretizer Data Modeling, Transformation, and Serving week-2	2	32	May 1, 2025
C3_W1_KMeans_Assignment 4 - Image compression with K-means Unsupervised Learning, Recommenders, Reinforcement week-1	5	456	July 20, 2023
Input-ids, labels in the data preparation notebook Finetuning Large Language Models	0	115	May 29, 2024
Is there a typo or am I misunderstanding something? Unsupervised Learning, Recommenders, Reinforcement week-1	1	25	September 9, 2024
Week2_Assignment1_ Logistic_Regression_with_a_Neural_Network_mindset Neural Networks and Deep Learning	3	559	March 7, 2022

Week 2 Lab - Confusing/Misleading kbins strategy

Related topics