Question regarding handling missing data in features

Hello guys,

I am working on a time-series prediction project, where we used about 20 features each time step to predict future data with few layers of neural network.

The problem I am facing is that for certain steps, some feature are not observable due to data availability ( for instance, only 15 in 20 is available), and simply dropping feature set with missing data would largely reduce the size of training set.

I have done some research and found two seemingly promising methods:

  1. filling missing data with mean-value or simply 0s. But in the case of filling with 0s, I do not know how to tell Tensorflow to treat them as missing data.
  2. used padding and mask method. However based on DLS, padding is mainly used in convolutional network, not for simply neural network. I wonder is there a viable way to apply padding and masking on my case?

Thanks for all your attention and contribution!

1 Like

Have you tried, Filling missing values using interpolation and forward filling? I used those two methods in a related project that had to do with cryptocurrency price prediction. you can edit the code to suit you.

Fill missing values using interpolation

    .interpolate(method='linear', inplace=True)
    
    # Forward fill missing values
    .fillna(method='ffill', inplace=True)
1 Like

Thanks, I will try this method

1 Like

you are welcome

1 Like