TensorFlow preprocessing multivariate time series Using tf.data.Datasets

I have a few multivariate time series that I wanna process using tf.data.Datasets however, I end up with 2D dataset which is not the shape expected from my input of recurrant Neural Ntwork mode, i.e input_shape (batch_size, time_step, # of dims) is not achieved, 2D dataset. Any advise of resource that I must look into??

Please keep a few things in mind when posting a question:

  1. If your question is unrelated to the course content, post it under General category.
  2. Provide sufficient details like a few rows of input and expected output to give readers sufficient context.

Here’s the community user guide to get started.

Actually, I have 5 time series representing state of the system under 5 different operating conditions. The problem is the length of each time series is different. However, we cannot drop values and neither we want to pad zeros to any of these series. The reason is that each time series is dependent upon the speed of the system, so no matter the length of the individual time series, each time series represent one complete revolution of the system. So, I want to preprocess the data using tf.data.Datasets such that I am sure that end result is, (batch_size, time_step, # of dims). or at least, I wanna b sure that I have (None, time_step, # of dims) shape if batching is not possible on such data

You can use the below code if you have univariate output (one target). This code can work for multiple features but the target should be one. Both features and target should be Numpy arrays and have a 2D shape. The result will be of (batch_size, window_size, features) and (batch_size, window_size, target) and can be feed to LSTM.

def windowed_dataset(features, target, window_size, batch_size):
    dataset = tf.data.Dataset.from_tensor_slices((features, target))
    dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
    dataset = dataset.flat_map(lambda x, y: tf.data.Dataset.zip((x.batch(window_size + 1), y.batch(window_size + 1))))
    dataset = dataset.map(lambda x, y: (x[:-1], y[1:]))  
    dataset = dataset.batch(batch_size).prefetch(1)
    return dataset

I also published my GitHub repo with a real dataset where I used that function for 10 features and 1 target. However, I cannot share a link here as self-promotion is against the community code of honor. Also, I will replace that repo soon with a multivariate one, where we have multiple targets.

Best,
Saif.