The fact is, when I take a look at the code, or according to the descriptions of the course, the prediction for validation set using the statistical approaches is possible, because we have assumed that we already know the ground truth values. But this is not true for the test data, we assume that we have no clue about the test data, and with this for example the naive forecasting is not applicable, because we should know the timestep before the current prediction time. I would give an example to illustrate this:
Assuming:
split_time = 1000 (this is for test data)
and we are using the naive prediction, we will have
predicting the moment t=1001
x(1001) = x(1000) -----> we know that we have this moment, because it is our last moment from train data, but what about x(1002)
x(1002) = x(1001) -----> we don’t have the ground truth value for this moment, because x(1001) is from our test data, and we ought to use our own previous prediction (x(1001)) to predict the x(1002).
There are no description regarding this in the week 1 content. If we don’t wanna use our own predictions, then in the case of naive forecasting, we would only be able to forecast for only one timestep, and in the case of moving average only for the length of our window. I wonder, shouldn’t we use our own predictions in the case of validating our model using the validation set instead of the ground truth data, since this would be closer to forecasting via the test data, and would represent a more realistic estimation of the our error metric?
I would be more than happy if you would illustrate this more.
Best Regards,
Milad