Clustering with DBSCAN

Mangonzalez12 · April 27, 2024, 7:14pm

I have a data matrix that represents patients in rows and gene expression in columns. These are patients that have distinct stages of liver disease. However, I want to focus on a set of genes from the matrix that represent the gene expression of a particular cell type from the liver.

I want to use unsupervised learning to cluster the patients using DBSCAN in scikit learn.

The input matrix is 200 patients and 150 genes. I scale my data first with 2logr transformation (usually done in bioinformatics) and run gridsearch with DBSCAN (various epsilon and min_samples combinations), however even the best parameters gave me -1 (noise) in all my samples. I also tried with the normalized data as input with StandardScaler from scikit-learn, with the same results.

I ran UMAP to reduce the dimension of my data and then I ran DBSCAN and found 4 clusters and only 9 samples as noise (-1). My question is whether clustering on the 2 dimensions obtained from UMAP is acceptable, since i dont find clusters with original data (150 dimensions)

TMosh · April 27, 2024, 8:38pm

That’s a problem. You do not have nearly enough examples given that number of features.

Mangonzalez12 · April 27, 2024, 9:33pm

Yes thats the input size

TMosh · April 27, 2024, 9:55pm

You do not have enough data to do the task you’re attempting.

Topic		Replies	Views
Clustering: DNA Microarray Example Supervised ML: Regression and Classification week-1	3	395	September 6, 2023
Initializing Cluster centroids for 3D arrays Unsupervised Learning, Recommenders, Reinforcement week-1	1	503	January 14, 2023
Clustering algorithm using Python packages Unsupervised Learning, Recommenders, Reinforcement	2	301	December 8, 2024
Clustering and PCA Unsupervised Learning, Recommenders, Reinforcement week-3	2	508	July 14, 2023
How to apply Clustering Techniques to a Cannabis Dataset for Analysis AI Discussions ai-discussions	3	40	September 2, 2024

Clustering with DBSCAN

Related topics