Shuffle in time series

Generally in time series we won’t shuffle the data before we devide into train, validation , test as it effect on seasonlity and trend present in the over time period

def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
“”"Generates dataset windows

  series (array of float) - contains the values of the time series
  window_size (int) - the number of time steps to include in the feature
  batch_size (int) - the batch size
  shuffle_buffer(int) - buffer size to use for the shuffle method

  dataset (TF Dataset) - TF Dataset containing time windows

# Generate a TF Dataset from the series values
dataset =

# Window the data but only take those with the specified size
dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)

# Flatten the windows by putting its elements in a single batch
dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))

# Create tuples with features and labels 
dataset = window: (window[:-1], window[-1]))

# Shuffle the windows
dataset = dataset.shuffle(shuffle_buffer)

# Create batches of windows
dataset = dataset.batch(batch_size).prefetch(1)

return dataset 

why we use here shuffle method in the time series how it is usefull normally we cannot the shffle the time series data then why here we shuffled

Here its shuffling the entrire window and label together. So you use the window to predict the label at window+1, you are not mixing anything up just changing the order of the windows.

You would do that to avoid any pattern forming because of sequential timing. The model should learn to predict the label nomatter if its the beggining, middle or end of series.


Does it mean then that after windowing technique applied to a timeseries data we can use k-fold cross validation for example?

Thanks for the reply and the post,

We cannot use k-fold cross validation on a time series data.

The simplest reason is that we cant use data points from the future to predict the past and another reason is that we will lose the pattern within the data.

Instead we can use the cross validation on rolling basis.

One way to do that is:

Starting with a small subset of data for training purpose, forecast for the later data points and then checking the accuracy for the forecasted data points. The same forecasted data points are then included as part of the next training dataset and subsequent data points are forecasted.

Hence similar approach to the above code.

Please feel free to further discuss this or if you have another question.



Thanks a lot for the suggestion - it’s very interesting technique, I will certainly try it, but it is still unclear about the shuffle function. Here is the point:

  1. We have continuous dataset: [1 2 3 4 5 6 7 8]
  2. We apply windowing technique (the function in original question) and transform dataset into features and labels:
    [[1 2 3 4 5] [6]]
    [[2 3 4 5 6] [7]]
    [[3 4 5 6 7] [8]]
  3. And in the end of that function we shuffle windows
    dataset = dataset.shuffle(shuffle_buffer)
    And get our dataset looking something like this:
    [[3 4 5 6 7] [8]]
    [[1 2 3 4 5] [6]]
    [[2 3 4 5 6] [7]]
    So, now we intentionally broke imposed order of windows in our dataset but preserved time series sequencing within each window.
    The question:
  • Why wouldn’t it be legitimate to use k-fold cross validation? The order of samples in the dataset is already randomized.
  • If we really want to preserve the order of samples, why shuffle then?
1 Like

@Nikita_Razguliaev Thanks for the reply and the great question,

I would like us to agree on the goals here:

  1. Time series dataset, Is a dataset with a pattern and sequence that we would like to preserve its characteristic(s) (i.e. seasonality).
  2. Window: as you mentioned a technique to transform the data into features and a prediction (i.e. the next sequence in this pattern)
  3. shuffling here is a technique applied to the window itself rather that than the dataset. i.e. preserving the pattern and sequence (features and labels)
  4. K-fold validation implies shuffling the whole dataset in random orders and choosing a fold (K), i.e. a subgroup, randomly. (please correct me if I am wrong)

So to answer the question:

Q1. K-fold cross validation would not preserve the sequence and pattern or seasonality of the time series dataset. Windowing though would achieve a similar goal for this type of a dataset.

Q2: In Shuffling, we are preserving the order, but we are sub-sitting the data into windows with features and labels using the buffer size.

Please feel free to correct me or add to the discussion

Thanks again for the great question,

Thanks for continued discussion.

I agree with how you define those points, but it seems there is some misunderstanding about a dataset transformations here.

We basically consider two major pipeline scenarios here:

  1. time series dataset → split into train/val subsets → windowing subsets separately to get features and labels → using it to train the model, using val subset for evaluating performance of the model

  2. time series dataset → windowing the whole dataset to get features and labels → splitting into train/val subsets → using subsets to train and evaluate the model

While you are absolutely right about the first (1) case - it is wrong to use traditional k-fold CV or any other train/val split technique that picks data points for the subsets randomly and through that violates sequencing of the original timeseries dataset, I’m really not sure that this is the case for the second scenario. Because in the second (2) case scenario we would be applying a split technique (for example k-fold CV) to the dataset that looks like this:
[[3 4 5 6 7] [8]]
[[1 2 3 4 5] [6]]
[[2 3 4 5 6] [7]]
i.e. a shuffled (randomized) dataset of windows with original time series sequencing preserved within each window.

I believe it doesn’t matter if we split our dataset into train and val subsets before transforming into features and labels (windowing) or after (correct me, if there are some problems with my judgement on this). Thus, it seems absolutely legitimate to me to make train/val subsets comprised of randomly picked windows.

I hope all this makes sense to you.
Unless there is some fundamental misunderstanding on my side, I believe, we can transform our dataset into windows first, and then it is absolutely fine to use any traditional technique for train/val split.

I hope you agree with me or point me out, where this logic stops making sense. I would be very happy to get any further explanation on this from more experienced person.


Hi @Nikita_Razguliaev,

I think we are not agreeing on the concept of a k-fold cross validation and if it could be used in a time series dataset.

our discussion boils down to the following points:

  1. K-fold as a cross validation technique applied to the dataset would apply picking dataset randomly from the dataset hence would affect the dataset. (k = number of batches, randomness could be regenerated if we specify some variables, check Keras or Sickitlearn documentation for k-fold cross validation.

  2. Cross validation, when having a time series dataset, can be used as in the case a rolling basis cross validation.

Example of a rolling basis cross validation:
dataset = [1,2,3,4,5,6,7,8]

  • Training: [1] Test: [2]
  • Training: [1, 2] Test: [3]
  • Training: [1, 2, 3] Test: [4]
  • Training: [1, 2, 3, 4] Test: [5]
  • Training: [1, 2, 3, 4, 5] Test: [6]
  • Training: [1, 2, 3, 4, 5, 6] Test: [7]
  • Training: [1, 2, 3, 4, 5, 6, 7] Test: [5]

then we compute the accuracy then average them out.

Another technique we can use is to use the buffer size and split the dataset then we shuffle the windows while keeping the sequence.

to summarize,

  1. k-fold cross validation is not a compatible with the time series dataset (unless not randomizing the sample picking).
  2. Rolling basis as a cross validation could be safely applied to a time series dataset.
  3. splitting the dataset into testing and validation should be done carefully to avoid the look ahead bias in your model.



sorry I still do not understand one thing : we change the order of the series inside a window by shuffling them. We do it in the window only. OK.
Still, it will change the order and the time that correspond to each series. So we change the “time pattern”, in other words, the moment where each measures was recorded. The sequential models will lose the time related information within a window. why is this good ? Is it not bad ?

No, you never change the order of the data. That would destroy the sequence you’re trying to learn from.

so what do you shuffle ? You shuffle the windows within a batch, right ? This changes the temporal information in which you show the data to the model in this case, no ?
If not, I do not understand, in this case please explain me.

See Item 3 in the mentor’s post from Jan 6.

you mean this ? :

Let us focus on a given time window again.

So if this was before suffling in the time window:
x = [0 1 2 3]
y = 4

x = [1 2 3 4]
y = 5

x = [2 3 4 5]
y = 6

x = [3 4 5 6]
y = 7

x = [4 5 6 7]
y = 8

x = [5 6 7 8]
y = 9

It might look like this after shuffling in the time window:

x = [1 2 3 4]
y = 5

x = [0 1 2 3]
y = 4

x = [3 4 5 6]
y = 7

x = [4 5 6 7]
y = 8

x = [5 6 7 8]
y = 9

x = [2 3 4 5]
y = 6

So my question remains the same : The sequential models will lose the time related information within a time window. why is this good ? Is it not bad ? Looks like time-related information is lost for the sequential model.

The time information within each window has not changed. They’re each a segment of the full sequence.