Why shift 370 time steps and not let say 365 time steps?

Legends123 · May 31, 2022, 10:54am

Hi everyone,

I’m working on the coding assignment for week 1 of course 4 at Google Colab.

Part of the code here is to moving averaging on past values and it does so by starts doing so from the 370th elements before the element at SPLIT_TIME (element 1100th). The code is:
“moving_average_forecast(SERIES[SPLIT_TIME - 370:-360], 10)”

But I don’t understand how the author of this code comes up with this number 370, why can’t it be 365 or 368? I tried replacing these numbers and the predicted graph is less similar to the validation graph and have higher mse&mae values, so I guess the number 370 is optimal in helping to generate the lowest mse&mae but can someone help me explain how the author decides to use this number?

Thank you so much!!

adonaivera · June 1, 2022, 1:49pm

Hello @Legends123

Welcome to our Community! Thanks for reaching out. We are here to help you.

The main idea of using moving averaging on the past is to remove some noise when forecasting.
The first thing you must define is a time window; in this case, the time window was ten days, that is why they take a year ago (365 days) and add five days and then retake the 365 days, and now you subtract five days left to finish with a window of 10. (370 and 355).

If we do a test with a smaller time window of six days, we take the same principle, 365 +3 and 365 -3.

smooth_past_series = moving_average_forecast(SERIES[SPLIT_TIME - 368:-362], 6)

it will give you a mse: 9.98, mae: 2.09 for moving average plus smooth past forecast

A much more accurate precision because the window is much smaller

This specific time series has that moving average of 365 days due to seasonality, but when you face industry problems, you will find that much of this data seasonality can be monthly, weekly, fortnightly, daily, or even hourly; it is our role to find this seasonality and define a time window threshold to predict.

Hopefully, help

Muhammad_Usman5 · June 1, 2022, 1:52pm

Please help me to fix this error: TypeError: ‘numpy.ndarray’ object is not callable

MayankGhogale · June 1, 2022, 4:27pm

Hello sir,
Welcome to the community!
This seems to be an issue with the syntax you are using sir
Could you please click on my name and share your notebook via personal message in both ipynb and pdf form so that I can guide you better
Thanks and Regards,
Mayank Ghogale

Legends123 · June 2, 2022, 2:01am

This is very helpful! Thank you so much!!

Andrej_Saibel · July 14, 2022, 5:46pm

Unfortunately, I still don’t understand why we have to subtract five days instead of 10.
I think this is very inconsistent in the exercise and it took me half an hour to come up with the desired solution.

Why do we remove 50 days in the case of diff_moving_avg instead of 25, but at the same time, we remove 5 days instead of 10 for the smooth_past_series?
I think both should either use the trailing average definition or both should use the centered average.

adonaivera · July 18, 2022, 2:01am

Hi @Andrej_Saibel
I understand that the main question is how to define the window size?
There a many ways to do it. However, in the course, we don’t go deep into this concept, but you can find the size of the window with:

Select the window size that minimizes the variance
Determine characteristic self-correlation lengths using output autocorrelation functions
Determine characteristic crosscorrelation lengths using
You can choose your width according to your attenuation needs input-output crosscorrelation functions
Try to do an iteration process (Generally, one picks the size of a sliding window that captures enough information. Pick it too big; you will get more irrelevant information (loss of resolution). Pick too small; you will lose details.)
But in the end, you should iterate with each of the datasets and problems you will work on to find the correct windows size for that specific problem.

In the course, we define the window size by taking into account the insight of the data.

Let me know if this helps.

Topic		Replies	Views
C4_W1_Lab_2_forecasting question Sequences, Time Series and Prediction week-1	3	406	September 27, 2024
Sequences, Time Series and Prediction: C4W1_Lab2 Sequences, Time Series and Prediction week-1	1	27	July 16, 2024
C4W1 Assignment last cell expected output Sequences, Time Series and Prediction week-1	11	1031	July 19, 2023
Sequences, Time Series and Prediction: C4W1 Sequences, Time Series and Prediction week-1	8	96	July 17, 2024
C4W1 assignment - question about last cell Sequences, Time Series and Prediction week-1	1	574	March 26, 2023

Why shift 370 time steps and not let say 365 time steps?

Related topics