General methodology for handling missing data in training examples

jeffreydhy · July 11, 2022, 4:41pm

Hi All,

I am curious to know what are the most common ways to handle missing data in training examples for machine learning and deep learning algorithms. For example, if we are predicting a house price and have 5 features: sqft, bedrooms, bathrooms, floors and year-built. For some of the training examples, we may not have the year-built data and for some other examples, we may miss bedrooms data. When we apply the machine learning algorithm to predict the house price, how should we handle these missing data?

One way I can think of is to prepare the training data so that we can populate the missing data with some predicated value, like the average of all other examples for that feature. Would this be a good way to handle missing data? Or can we do anything at the run-time to let the algorithm handle the missing data automatically for us?

Thanks,

SamReiswig · July 11, 2022, 8:46pm

Hi!

Yes! To handle missing data we can do actions such as dropping the rows with empty, replacing the empty values with the mean, or even generating the missing values using another ML algorithm.
Check out this Kaggle tutorial on how to handle missing values in a dataset Missing Values | Kaggle

I don’t know of a general way for algorithms to handle missing data automatically. In Natural Language Processing there are ways for those algorithms to handle words they’ve never “seen” before, but that’s about all I know.

Hope this helps!

TMosh · July 12, 2022, 2:16am

The collaborative filtering method discusses a way to mask missng values from being included in the cost and gradient calculations.

It may be applicable here.

It’s covered later in the course.

Topic		Replies	Views
Question regarding handling missing data in features AI Discussions ai-discussions , project	3	143	May 20, 2024
Prakash Hinduja Switzerland (Swiss) How do I handle missing or unstructured data in deep learning projects? AI Discussions ai-discussions	2	20	November 18, 2025
How do you handle missing values in data? Explain in detail assuming different scenarios Machine Learning in Production	4	629	August 9, 2023
Time series data missing some observation AI Discussions	6	52	October 3, 2023
How decision tree handles the missing value Supervised ML: Regression and Classification week-module-1	1	495	September 21, 2022

General methodology for handling missing data in training examples

Related topics