Hi @everyone I am working on topic modelling task using bert and llama2 and performing dimensionality reduction using umap and performing clustering using hdbscan . How should I go about hyperparameter tuning for umap and hdbscan. The code should be completely dynamic meaning no human intervention should required.
Thanks in advance
1 Like
@Preeti_Rani2
Just a suggestion why don’t you move your query to AI project, so more people have access to your query to give better and more suggestions!!!
Also when you state this you need to give some idea about what kind of data you are working on briefly.
Regards
DP
1 Like
Hi, I could not find your question on AI project so replying here.
UMAP and hdbscan results depend on the type of distribution of your datapoint in hyperplane. Unless you are sure how your datapoint distribution might look like everytime the topic modelling task is run, it might be difficult to fix parameters for UMAP and hdbscan.
I would recommend giving a slider/selector on the UI for the user to move/select to see different clustering results by themselves. Anyway clustering results are mostly approximations, since we do not know the final data distribution, it is better for the user to play around with such params for both UMAP and HDBScan.
I’m also working on Topic Modeling. I have a doubt regarding Evaluation Metrics (c_v value). Which c_v value is considered to be good enough (is there any minimum threshold)?