How reliable is the validation process for the time series analysis?

Hello everyone,

I have actually finished this course about month or so before. I was applying time series analysis to a project, when it hit me. How I am understanding the validation process all throughout the course is, in order to evaluate the model we have generated windowed dataset from the actual time series data. The windows are used to generate prediction from the model which is compared with the validation data to get MAE values. So my questions are the following:

  1. is my understanding of the validation process is correct?

  2. If my understanding is correct, does this validation process hold much credibility? Since the validation process is getting crutched by actual series data, our models can very effectively predict the timings of highs and lows, the trend of and most importantly actual values of the series. Anomalous prediction is not creating diversion for subsequent predictions. If I were to make real prediction, I would not have this type of validation data to lean upon, so I am not sure how honest my validation MAE is being with me.

If I am understanding this correctly, shouldn’t we take:

  • the actual data points ranged inside window length immediately before the validation dataset
  • Use them to predict the first point of validation
  • Subsequently move the window and append the predictions in the window
  • Then compare the prediction data with validation data?

I hope someone can clarify on this. I dug around this forum but found nothing, hence made the post; apologies if it is a duplicate issue.

2 Likes

Validation data is genereated starting from tail of training data. This is then compared with the actual validation data in terms of MSE / MAE.
Please look at the shape of the dataset. You input windows_size data points as input and predict only 1 timestep as output. So, keep the window sliding 1 step at a time till you reach the end of prediction range.

Goal is to predict the next timesteps given the previous timesteps. It’s like spliting the validation set into a regression evaluation. Both inputs and outputs are taken from the validation data but the trained model parameters are used to predict the expected value.

Error metric should be low as long as both training and validation data come from similar distributions.

1 Like

Thank you for your reply! My issue was basically that validation predictions were using in sample data, and not the prediction it made in the steps behind. I switched the code accordingly. The issue I am facing now, is that after training until validation loss doesn’t improve I tried generating some predictions. While my model could predict the ‘phases’ of the actual data (I think it is termed seasonality?) it cannot capture the auto-correlation between the seasons. I looked around the internet and found no clue, so I am asking you in the of chance if you have any resources or guides about it? Thank you again for your cooperation.
gal_1+1

Please take a look at week 1 lectures and notebooks. Concepts such as seasonality and differencing (a good strategy to address trend & seasonality) are described there.

autocorrelation will give you a guideline on how far behind to look for differencing.

Read this to understand the importance of normalization.

I recommend you go till week 4 since that’s where you start using conv and recurrent layers for modeling. See if that helps as well.

Hi! I now find myself questioning the reliability of the validation process too, for reasons that may be a little bit similar to what you are describing. I would like to share here what I posted so maybe you have some knowledge about it. (Validation set impact on model prediction - predict())