This week I finished the Machine Learning Specialization on Coursera offered by deeplearning.ai & Stanford University and I want to start my practise from e-commerece Cannabis dataset I have.
I am currently working with a Cannabis dataset that includes various features related to product strains, effects, and user orders.
I’m interested in applying clustering techniques to this dataset to uncover patterns or groupings that might not be immediately apparent.
Could you please provide guidance on how to effectively employ clustering algorithms, such as K-means in this context? Specifically, how do I choose the right features from the dataset that are most relevent for clustering?
Sounds to me like you might benefit from a deep dive into Exploratory Data Analysis on your dataset before you try building unsupervised or supervised ML models.
Here’s an example from the interweb done on the ubiquitous King’s County Housing Data
Note: I am not endorsing this as the best example. It is one of hundreds out there. Maybe take a read through and steal, eh, reuse ideas that look helpful. Once you have a good idea of the data, picking an ML approach should be more straightforward.
Initially, I recommend you include all of the features. You might get useful results without the risk of throwing away features that could be vitally important.