Why Shuffle Sequential Data

Hi, everyone.

I have recently started working with time series data. This course is being very helpful in this learning journey.

I have a question related to this week class:

In this week, it is introduced the ‘shuffle buffer’ parameter, which helps us shuffle our time series data. I inderstand that it is necessary to avoid sequence bias, but I’ve also read that, in time series cross validation, for instantce, it is not recommended to use the parameter shuffle=True (otherwise, we would fail to “teach” the model that there is a temporal sequence of information).

Could someone explain why the “shuffle buffer” doesn’t harm training, whereas shuffle = True in cv does?

Thanks in advance!

Rows are shuffled only after creating input to output mappings when it comes to time series data. As a result, data within a single row will be in increasing order of time. So, no damage is done to validity of data.

Thank you for your reply, @balaji.ambresh . I understand now!

Balaji can you elaborate more on this? I don’t see why the autocorrelation principle is NOT violated by shuffling data.

We are not shuffling the order of data points within a single row of input. I don’t see why auto correlation should be of concern.