Zero-inflated data

Hello Everyone.,

I’ve been working on zero-inflated data. The data where the target variable has like 90% of zeros. Is there any model that can predict the zeros and the non-zero values well?

I already tried up and down sampling which didn’t give good results. Applying log transformation to the target variable is also not that good. Could anyone of you suggest any statistical models that are good for the problem?

Hi @ajaykumar3456

I think there are 3 solution
First when you use logistic regression or Support vector classifier you can set the threshold less than 0.5 for example 0.3 or 0.2 that increase sensitivity

Second you can use KNN model which predict base on near neighbours

Third you can use anomaly detection model which can be assign the anomalous

I hope they help you,
Please feel free to ask any questions,
Thanks,
Abdelrahman

I think your title is a pretty good searching keyword to start with, and your question of asking for a good statistical model is a pretty good searching direction. You may want to google “zero inflated model”, and perhaps plot the distribution of your targets.

It seems to me you know what you need which is good, and you just need to start to do the research. You won’t just find log transformation.