[C2_W4_Lab_1_WeatherData] Question about train/ val split for time series data

surfii3z · September 25, 2021, 9:26am

Hi team,

Thank you for the wonderful course. I have a question regarding the train/ eval split on the time series dataset.

In the lab, we use ExampleGen to do train/ eval split. Doesn’t this break the continuity of the timestep?

Say problem is to use HISTORY_SIZE = 2 (x_t-1, x_t) to FUTURE_TARGET=1 (y_t+1).

And we have the time series data with 6 data points [(x0, y0), (x1, y1), …, (x8, y8)] where x_i is the feature vector at the time index i and y_i is the label at the time step i.

Then we get something like

train: [(x0, y0), (x1, y1), (x2, y2), (x5, y5), (x7, y7), (x8, y8)]

val: [(x3, y3), (x4, y4), (x6, y6)]

Then in the training batch, we can have something like this

training datapoints (features list, target list) : ([x1, x2], y5)

Does this make sense? Or did I misunderstood something?

I am confused if this is the way to split the train/ eval data especially when we are trying to prepare the data to train LSTM network as stated in the notebook?

Shouldn’t we do it like this?

train: [(x0, y0), (x1, y1), (x2, y2), (x3, y3), (x4, y4), (x5, y5)]

val: [(x6, y6), (x7, y7), (x8, y8)]

But again for this case, we might introduce the distribution shift to the train/ val because the features/ target itself might have trends which vary over time.

Best Regards,

surfii3z

chris.favila · October 1, 2021, 7:31am

Hi! Welcome to Discourse! Thank you for pointing this out! I think you’re right and the shuffling of the dataset might have affected the periodicity. We’ll investigate this and update the notebook if needed. Thanks again!

surfii3z · October 2, 2021, 4:45am

Hi @chris.favila T

Thank you for your kind response

I am looking forward for the clarification.

Best

Surfii3z

Topic		Replies	Views
Stratified ExampleGen Machine Learning Data Lifecycle in Production	3	564	October 27, 2021
ExampleGen and transformGen for lstm Machine Learning Data Lifecycle in Production	1	527	July 18, 2022
Assignment 1 - train_val_split Sequences, Time Series and Prediction week-1	3	648	July 3, 2022
This Assignment is giving me Headache, Please Look into this with me Sequences, Time Series and Prediction week-1	4	387	August 15, 2024
Why ExampleGen generates just train_set and eval_set Machine Learning Modeling Pipelines in Production week-2 , general	5	198	April 10, 2024

[C2_W4_Lab_1_WeatherData] Question about train/ val split for time series data

Related topics