Distances suitable for mixed variables!

In many applications, my data has both quantitative and qualitative features; for instance, numeric features that come from a text embedding and some categorical variables that come from other sources. When I want to perform nearest neighbours (or its approximation) I need a distance that allows me to compare mixed data that has both numerical and categorical data. I know one of this distances is Gowert distance. Is Gowert distance appropriate for this problem? Also, what other distances can be used?

Thanks !

Hi @Mauricio_Toro

I’m not sure you posted in the right section (NLP Specialization), you probably should have posted this question in Machine Learning Specialization.

If I would try to answer your question, then the distance/similarity metrics depend on your application and whichever leads to best results/performance. You can one-hot encode or embed the categorical data and the use usual ML algorithms like k-nearest neighbors.

Cheers