Machine learning model for numeric columns

I’m trying to build a model to recognize fields, based on their values.

As you can see in the picture, I build a model to recognize which field is Rubrique and that works just fine. I took a list of the words that it contains which are specific to this columns, and based on that I built the model.

Now as you can see I have other fields which are numeric and somehow similar. I don’t know what approach should be used in order to classify them.

If those numeric fields have a few lets say 20 unique values then I would think a categorical classification might work. If there are many unique values then an NLP based model could be helpful.

Somekind of embedding coding could also be used I think.


Hi there,

in addition to @gent.spah‘s hints as some input:

If I understand correctly you are interested in the category as an output (label).

Based on attributes (e.g. numerical values) you can start feature engineering based on your domain knowledge. Here some ideas how to transform the data: Transforming Your Data: Check Your Understanding  |  Data Preparation and Feature Engineering for Machine Learning  |  Google Developers

E.g bucketing / binning might be worth a try if this is meaningful for your features:

So I guess some features are more informative than others. You should focus on these. (If you want to quantify the importance, calculation of a feature ranking might be helpful, too.)

You can also incorporate other categorical features, e.g using one hot encoding approaches.

When playing around with features, I would suggest to visualise your data to:

  • understand your data
  • check assumptions
  • evaluate how feature engineering is going
  • prepare and accompany your modelling activity after normalising

Tnx a lot.
I’ll give it a try

Thank you very much.
So I’m going to work on it

1 Like