Thank you for your reply.
When we are training a model, there is first a forward propagation followed by a backward propagation. And this is repeated for “epoch” times. If you define batches, which is pretty standard, then each epoch goes also through ‘n’ batches.
So I call a cycle to each forward+backward prop.
Lets create yet a new example: predicting the weather in the next hour. To train this model, we have to gather actual data from the past and create a dataset to train our model. Lets say I get the following data:
Each hour the temperature in Fahrenheit:
1:00AM: 65
2:00AM: 67
3:00AM: 68
4:00AM: 69
5:00AM: 70
6:00AM: 71
7:00AM: 72
8:00AM: 73
This is going to be all the data needed to train my time series model. Now I will organize this data to build a training set. I will define that X_train will be size = 5 and with 5 temperatures I can predict the 6th temperature. I will use a ‘sliding’ window to create the different samples.
This will be then my X_train and y_train:
X_train_1=[65, 66, 67, 68, 69] y_train_1=[70]
X_train_2=[66, 67, 68, 69, 70] y_train_2=[71]
X_train_3=[67, 68, 69, 70, 71] y_train_3=[72]
X_train_4=[68, 69, 70, 71, 72] y_train_4=[73]
During training, we want to give the model the ‘truth’, have the model predict, and compare the prediction with the ‘truth’.
We start training. Let me run a training with hypothetical predictions. Remember: each cycle means one forward prop and one backward prop, and in each cycle we use one of the samples above:
Cycle 1: Input = X_train_1. y_hat_1 (predicted by the model): 89 y_train1=70 Loss: -19
Cycle 2: Input = X_train_2. y_hat_2 (predicted by the model): 82 y_train2=71 Loss: -11
Cycle 3: Input = X_train_3. y_hat_3 (predicted by the model): 75 y_train3=72 Loss: -3
Cycle 4: Input = X_train_4. y_hat_4 (predicted by the model): 74 y_train4=73 Loss: -1
See how the loss has been getting smaller and smaller cycle after cycle? the model is learning to predict the next temperature when given the past 5 temperatures. This is possible because we are calculating the loss of the predicted value against ‘actual’ data (ground truth).
Now lets pretend that instead of using the ‘actual’ temperatures we use the ‘predicted’ temperatures:
Cycle 1: We use the first X_train1 and y_train1.
X_train_1=[65, 66, 67, 68, 69] y_train_1=[70]
Input = X_train_1. y_hat_1 (predicted by the model): 89 y_train1=70 Loss: -19
Cycle 2: We take X_train2 but replace the last entry with the previous prediction
X_train_2=[66, 67, 68, 89] y_train_1=[71]
Input = X_train_2. y_hat_2 (predicted by the model): 94 y_train1=70 Loss: -29
And lets stop this new simulation here because the model is already lost. By using the generated temperature on cycle 1 for the Cycle2, we are immediately getting away from the actual data and the next prediction will be based on inaccurate data.
In my example I am using X_train with size 5, but the same would be if the size = 1, as in your example above.
Please let me know if this is a bit more clear.