Validation set impact on model prediction - predict()

Hello everyone,

I completed this course a couple of weeks ago, so now I am working on my own projects regarding Time Series Analysis and Forecasting. Just now that I have experienced with the codes provided, adapting them for my own means, I am beginning to notice a couple of things that make me question the reliability of the validation process. I am using these words because I just saw this similar post, looking for answers to my doubts (How reliable is the validation process for the time series analysis?).

Just as ‘DoubleE’, I am having trouble trusting the credibility of the validation process. My reason is that I believe that the validation set has a direct impact on the predicted values, so it generally doesn’t have that much difference with the real data. Let me explain with more details:

I am not posting any code here so it doesn’t get messy. But take the single-layer neural network model of week 2 as reference. I am aware that the model is trained based on the train_set and not the valid_set. Because our dataset (created with the windowed_dataset function) contains just that set and this will be able to get the weights of the model. Now, when we get to the prediction section and we create our Forecasting list with our loop that generates a prediction with the ‘predict()’ method shifting window by window, because we are covering all the series data and not just the training set, I believe that in this process, part of the validation set gets incorporated in some of the windows and this may affect the predicted values.

So I thought, ‘What if I try to change my validation set for another and see if it generates a different prediction?’. Well I actually did that. I created a fake validation set so I can see if it has an impact on the forecasting, and it does. I didn’t get good predictions because this fake data is not related to the rest of the data I may think, but I noticed an effort on the model on trying to follow the set behavior, although it is not part of the real data.

So in summary, I have the following doubts:

  • How does the predict() method work exactly in this case?
  • Is the validation set really affecting the forecasting although it shouldn’t?

Please feel free to share your thoughts about it. I have been learning Machine Learning and Deep Learning for just a couple of months, and I am really motivated to know all the techniques and involve working with Time Series and Forecasting.

model.predict doesn’t care about the data split since it doesn’t modify model weights. How you use this method boils down to the model architecture. Weights are learnt from the training data and predictive power is restricted to 1 timestep into the future.

With these details in mind, please go through C3 W4 assignment and reply with how time series way of using validation data is any different for sequence of text data in natural language.