TensorFlow preprocessing multivariate time series Using tf.data.Datasets

Abdul_Jabbar1 · October 5, 2023, 11:07am

I have a few multivariate time series that I wanna process using tf.data.Datasets however, I end up with 2D dataset which is not the shape expected from my input of recurrant Neural Ntwork mode, i.e input_shape (batch_size, time_step, # of dims) is not achieved, 2D dataset. Any advise of resource that I must look into??

balaji.ambresh · October 5, 2023, 11:26am

Please keep a few things in mind when posting a question:

If your question is unrelated to the course content, post it under General category.
Provide sufficient details like a few rows of input and expected output to give readers sufficient context.

Here’s the community user guide to get started.

Abdul_Jabbar1 · October 5, 2023, 11:54am

Actually, I have 5 time series representing state of the system under 5 different operating conditions. The problem is the length of each time series is different. However, we cannot drop values and neither we want to pad zeros to any of these series. The reason is that each time series is dependent upon the speed of the system, so no matter the length of the individual time series, each time series represent one complete revolution of the system. So, I want to preprocess the data using tf.data.Datasets such that I am sure that end result is, (batch_size, time_step, # of dims). or at least, I wanna b sure that I have (None, time_step, # of dims) shape if batching is not possible on such data

saifkhanengr · October 5, 2023, 2:14pm

You can use the below code if you have univariate output (one target). This code can work for multiple features but the target should be one. Both features and target should be Numpy arrays and have a 2D shape. The result will be of (batch_size, window_size, features) and (batch_size, window_size, target) and can be feed to LSTM.

def windowed_dataset(features, target, window_size, batch_size):
    dataset = tf.data.Dataset.from_tensor_slices((features, target))
    dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
    dataset = dataset.flat_map(lambda x, y: tf.data.Dataset.zip((x.batch(window_size + 1), y.batch(window_size + 1))))
    dataset = dataset.map(lambda x, y: (x[:-1], y[1:]))  
    dataset = dataset.batch(batch_size).prefetch(1)
    return dataset

I also published my GitHub repo with a real dataset where I used that function for 10 features and 1 target. However, I cannot share a link here as self-promotion is against the community code of honor. Also, I will replace that repo soon with a multivariate one, where we have multiple targets.

Best,
Saif.

Topic		Replies	Views
Understanding Dataset processing Sequences, Time Series and Prediction week-4	2	614	April 6, 2023
Cannot batch tensors with different shapes in component 0 Natural Language Processing in TensorFlow	1	312	September 10, 2022
LSTM future values prediction Sequences, Time Series and Prediction week-4	5	524	October 10, 2022
Application of the algorithm to a dataset containing several features Sequences, Time Series and Prediction week-4	1	510	June 17, 2022
C3W2_Assignment Week 2: Diving deeper into the BBC News archive Natural Language Processing in TensorFlow week-2	1	16	April 2, 2025

TensorFlow preprocessing multivariate time series Using tf.data.Datasets

Related topics