Is statistical forcasting applicable also for test data

miladbohlouli · June 1, 2022, 12:24pm

The fact is, when I take a look at the code, or according to the descriptions of the course, the prediction for validation set using the statistical approaches is possible, because we have assumed that we already know the ground truth values. But this is not true for the test data, we assume that we have no clue about the test data, and with this for example the naive forecasting is not applicable, because we should know the timestep before the current prediction time. I would give an example to illustrate this:

Assuming:
split_time = 1000 (this is for test data)
and we are using the naive prediction, we will have
predicting the moment t=1001
x(1001) = x(1000) -----> we know that we have this moment, because it is our last moment from train data, but what about x(1002)
x(1002) = x(1001) -----> we don’t have the ground truth value for this moment, because x(1001) is from our test data, and we ought to use our own previous prediction (x(1001)) to predict the x(1002).

There are no description regarding this in the week 1 content. If we don’t wanna use our own predictions, then in the case of naive forecasting, we would only be able to forecast for only one timestep, and in the case of moving average only for the length of our window. I wonder, shouldn’t we use our own predictions in the case of validating our model using the validation set instead of the ground truth data, since this would be closer to forecasting via the test data, and would represent a more realistic estimation of the our error metric?

I would be more than happy if you would illustrate this more.

Best Regards,
Milad

balaji.ambresh · June 1, 2022, 1:11pm

We are given xs from test dataset and are asked to predict the corresponding ys.
As you mentioned, ys_test[0] = xs_train[-1]
For the rest of them, ys_test[i] = xs_test[i - 1].

miladbohlouli · June 1, 2022, 1:24pm

Thanks for the reply.

Isn’t it better to follow the same approach when predicting the validation set?

balaji.ambresh · June 1, 2022, 3:36pm

When training, you can discard the very first x. This is because, there is no -1 in time to predict ys_train[0]. When evaluating, you cannot afford to skip timestep 0. This is why ys_test[0] is covered as a corner case. Rest of the algorithm is the same.

If your suggestion is different, please specify your approach in terms of arrays & indices just the way I did. Do specify the entry for ys_test[0].

Topic		Replies	Views
Validation set impact on model prediction - predict() Sequences, Time Series and Prediction week-2	1	255	March 15, 2024
Naive forecasting Sequences, Time Series and Prediction week-1	1	369	November 7, 2023
Time series prediction AI Discussions ai-discussions	3	78	May 17, 2024
C4_W1_Assignment Naive Forecast Sequences, Time Series and Prediction week-1	6	665	July 19, 2022
How reliable is the validation process for the time series analysis? Sequences, Time Series and Prediction week-2	4	701	March 14, 2024

Is statistical forcasting applicable also for test data

Related topics