Question regarding handling missing data in features

Yuhan_Chen · May 20, 2024, 1:10pm

Hello guys,

I am working on a time-series prediction project, where we used about 20 features each time step to predict future data with few layers of neural network.

The problem I am facing is that for certain steps, some feature are not observable due to data availability ( for instance, only 15 in 20 is available), and simply dropping feature set with missing data would largely reduce the size of training set.

I have done some research and found two seemingly promising methods:

filling missing data with mean-value or simply 0s. But in the case of filling with 0s, I do not know how to tell Tensorflow to treat them as missing data.
used padding and mask method. However based on DLS, padding is mainly used in convolutional network, not for simply neural network. I wonder is there a viable way to apply padding and masking on my case?

Thanks for all your attention and contribution!

Zinniemeg · May 20, 2024, 1:27pm

Have you tried, Filling missing values using interpolation and forward filling? I used those two methods in a related project that had to do with cryptocurrency price prediction. you can edit the code to suit you.

Fill missing values using interpolation

    .interpolate(method='linear', inplace=True)
    
    # Forward fill missing values
    .fillna(method='ffill', inplace=True)

Yuhan_Chen · May 20, 2024, 1:28pm

Thanks, I will try this method

Zinniemeg · May 20, 2024, 1:29pm

you are welcome

Topic		Replies	Views
General methodology for handling missing data in training examples Supervised ML: Regression and Classification	2	244	July 12, 2022
How are padded values used? Natural Language Processing in TensorFlow week-module-1	1	561	October 13, 2021
Implementation of CNN-LSTM according to a research paper AI Discussions	1	66	June 16, 2024
Alternative Truncated data Natural Language Processing in TensorFlow	2	355	November 18, 2021
Using `padding=casual` for time series prediction Sequences, Time Series and Prediction week-module-4	6	39	March 4, 2025

Question regarding handling missing data in features

Fill missing values using interpolation

Related topics