How to find a good value for the threshold?

lbaiao · February 22, 2023, 4:51pm

In a real problem, what are the techniques for finding good values for the similarity threshold?

arvyzukai · February 23, 2023, 11:29am

Obviously in real world it depends at least on two factors:

Dataset:

how clean (deduplicated, missing values, typos etc.) - generally the cleaner the dateset the narrower the margin could be;
how long and wide (how many samples vs how many features) - generally the more samples and less features the narrower the margin could be;

Time/money (“All models are wrong, but some are useful” - a cliché but…):

how soon you need to have the model - generally the world won’t wait years for best model (hyper parameters) because things can change fast;
what benefit / profits does every percent of accuracy bring?

Having said that, some common approaches of hyperparameters optimization that also fit similarity threshold are:

Grid search
Random search
Bayesian optimization
Gradient-based optimization
Evolutionary optimization
Population-based

I mostly use Random Search and Grid Search (a simple tutorial).

But all in all, in my experience, “data-centric approach” > “model centric”. On the other hand, in other ML types like in Reinforcement Learning, hyper parameter search is essential.

Just my thoughts
Cheers

Topic		Replies	Views
Identify correct parameter values NLP with Probabilistic Models week-module-4	3	379	July 28, 2023
Hyperparameter search Structuring Machine Learning Projects week-module-1 , coursera-platform	1	15	August 20, 2024
Hyperparameter tuning process in ML strategy Structuring Machine Learning Projects coursera-platform	1	524	October 3, 2023
Random Search Hyperparam Tuning in Practice Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	526	July 26, 2022
Hyperparameters optimization Neural Networks and Deep Learning coursera-platform	1	510	December 9, 2021

How to find a good value for the threshold?

Related topics