Hi everyone,
I’m working on the coding assignment for week 1 of course 4 at Google Colab.
Part of the code here is to moving averaging on past values and it does so by starts doing so from the 370th elements before the element at SPLIT_TIME (element 1100th). The code is:
“moving_average_forecast(SERIES[SPLIT_TIME - 370:-360], 10)”
But I don’t understand how the author of this code comes up with this number 370, why can’t it be 365 or 368? I tried replacing these numbers and the predicted graph is less similar to the validation graph and have higher mse&mae values, so I guess the number 370 is optimal in helping to generate the lowest mse&mae but can someone help me explain how the author decides to use this number?
Thank you so much!!
Hello @Legends123
Welcome to our Community! Thanks for reaching out. We are here to help you.
The main idea of using moving averaging on the past is to remove some noise when forecasting.
The first thing you must define is a time window; in this case, the time window was ten days, that is why they take a year ago (365 days) and add five days and then retake the 365 days, and now you subtract five days left to finish with a window of 10. (370 and 355).
If we do a test with a smaller time window of six days, we take the same principle, 365 +3 and 365 -3.
smooth_past_series = moving_average_forecast(SERIES[SPLIT_TIME - 368:-362], 6)
it will give you a mse: 9.98, mae: 2.09 for moving average plus smooth past forecast
A much more accurate precision because the window is much smaller
This specific time series has that moving average of 365 days due to seasonality, but when you face industry problems, you will find that much of this data seasonality can be monthly, weekly, fortnightly, daily, or even hourly; it is our role to find this seasonality and define a time window threshold to predict.
Hopefully, help
1 Like
Please help me to fix this error: TypeError: ‘numpy.ndarray’ object is not callable
Hello sir,
Welcome to the community!
This seems to be an issue with the syntax you are using sir
Could you please click on my name and share your notebook via personal message in both ipynb and pdf form so that I can guide you better
Thanks and Regards,
Mayank Ghogale
This is very helpful! Thank you so much!!
Unfortunately, I still don’t understand why we have to subtract five days instead of 10.
I think this is very inconsistent in the exercise and it took me half an hour to come up with the desired solution.
Why do we remove 50 days in the case of diff_moving_avg instead of 25, but at the same time, we remove 5 days instead of 10 for the smooth_past_series?
I think both should either use the trailing average definition or both should use the centered average.
Hi @Andrej_Saibel
I understand that the main question is how to define the window size?
There a many ways to do it. However, in the course, we don’t go deep into this concept, but you can find the size of the window with:
- Select the window size that minimizes the variance
- Determine characteristic self-correlation lengths using output autocorrelation functions
- Determine characteristic crosscorrelation lengths using
- You can choose your width according to your attenuation needs input-output crosscorrelation functions
- Try to do an iteration process (Generally, one picks the size of a sliding window that captures enough information. Pick it too big; you will get more irrelevant information (loss of resolution). Pick too small; you will lose details.)
But in the end, you should iterate with each of the datasets and problems you will work on to find the correct windows size for that specific problem.
In the course, we define the window size by taking into account the insight of the data.
Let me know if this helps.