How to apply Clustering Techniques to a Cannabis Dataset for Analysis

Hi everyone,

This week I finished the Machine Learning Specialization on Coursera offered by deeplearning.ai & Stanford University and I want to start my practise from e-commerece Cannabis dataset I have.

I am currently working with a Cannabis dataset that includes various features related to product strains, effects, and user orders.

I’m interested in applying clustering techniques to this dataset to uncover patterns or groupings that might not be immediately apparent.

Could you please provide guidance on how to effectively employ clustering algorithms, such as K-means in this context? Specifically, how do I choose the right features from the dataset that are most relevent for clustering?

Thanks in advance.

@Musab_Bin_Gulfam I don’t know if this was covered in the ML course, but you might try PCA to try and rank the features by relevance.

What is less clear though from your description is what is your dependent variable ? What are you predicting ?

Sounds to me like you might benefit from a deep dive into Exploratory Data Analysis on your dataset before you try building unsupervised or supervised ML models.

Here’s an example from the interweb done on the ubiquitous King’s County Housing Data

Note: I am not endorsing this as the best example. It is one of hundreds out there. Maybe take a read through and steal, eh, reuse ideas that look helpful. Once you have a good idea of the data, picking an ML approach should be more straightforward.

Initially, I recommend you include all of the features. You might get useful results without the risk of throwing away features that could be vitally important.

2 Likes