Week 2 Lab - Confusing/Misleading kbins strategy

Hello,
In the lab, we are asked to create bins for the customers given their antiquity, with the bins represented by :
bin_edges = [0, 1, 3, 5, float(‘inf’)]

However, the binning strategy is ‘uniform’ which is a binning strategy where bins are of equal width:
kbins = KBinsDiscretizer(n_bins=len(bin_edges) - 1, encode=‘onehot-dense’, strategy=‘uniform’, subsample=None)

The labels are then defined using the original bin edges:
bin_labels = [‘0-1 years’, ‘1-3 years’, ‘3-5 years’, ‘5+ years’]

In the end, the data is confusing and possibly misleading. Is there a solution to this?

Hello @lystme,

You are correct, it seems that pd.cut with right=False (include left bin_edge)would be a better choise than KBinsDiscretizer with predefined bin_edges instead of custom ones and strategy=uniform. However I would get the same one-hot encoding dataframe for both strategies if I use pandas cut with right=True and if the data are uniformely distributed. Hope it helps