Hi All,
I am curious to know what are the most common ways to handle missing data in training examples for machine learning and deep learning algorithms. For example, if we are predicting a house price and have 5 features: sqft, bedrooms, bathrooms, floors and year-built. For some of the training examples, we may not have the year-built data and for some other examples, we may miss bedrooms data. When we apply the machine learning algorithm to predict the house price, how should we handle these missing data?
One way I can think of is to prepare the training data so that we can populate the missing data with some predicated value, like the average of all other examples for that feature. Would this be a good way to handle missing data? Or can we do anything at the run-time to let the algorithm handle the missing data automatically for us?
Thanks,