Lab 2: Understanding the indexing: possible dimension mismatch problem

Hi, I’m finding it hard to understand a code chunk in lab 2 of week 1 regarding Differencing.

diff_moving_avg_plus_past = series[split_time - 365:-365] + diff_moving_avg
plot_series(time_valid, (x_valid, diff_moving_avg_plus_past))

From the previous code just above this two lines of code, I gathered, that the indexing of diff_moving_avg (for validation) is from 605 to 1460 of the actual series because the code was:

diff_moving_avg = diff_moving_avg[split_time - 365 - 30:]

So dimension of diff_moving_avg is 856.
And now, after doing series[split_time - 365:-365] , we get the index 635 to 1095 of the actual series. So dimension of this becomes 460. How adding this two makes sense in terms of aligning dimensions?
It would be really helpful if someone can help me understanding this because without understanding this, I’m at a loss about the differencing concept and can’t attempt the programming assignment. I appreciate any idea I could get. Thanks.

Hi @amp1590

Slicing on both arrays is designed to align their time indices for the validation period, even if the original arrays have different lengths; essentially, only the corresponding 460 values from the differenced moving average are used to match the 460 values from the series slice, ensuring that when they’re added together, each element represents the proper time point in the validation set.

Hope it helps! Feel free to ask if you need further assistance!

Hi @Alireza_Saei , thank you so much for your reply. But isn’t

diff_moving_avg = diff_moving_avg[split_time - 365 - 30:]

is indicating the the index from 605 to 1460 given split_time=1000? Isn’t this for the validation set? There are 1460-605= 805 elements. How does that align with size 460? I am not understanding this. Thank you for your help.

Even though diff_moving_avg appears to have many more elements when sliced from index 605, only the portion that overlaps in time with series[split_time - 365:-365] (which has 460 elements) is used in the addition. This slicing makes sure that the value from 30 days earlier is correctly aligned with the value from 365 days ago.

Hope it helps! Let me know if you have any questions!

Hi @Alireza_Saei , Thank you for your explanation. It got me thinking the mathematical aspect and I finally understood it. I’m sharing my understanding here for future reference in case someone else struggles with this.
The series runs from index 0 to 1460. The diff_series runs from index 0 to 1095 (1460-365 gives 1095). Then the diff_moving_avg runs from index 0 to 1065 (1460-365-30 gives 1065).
Now for the validation set, diff_moving_avg runs from index 605 to 1065 (split_time - 365-30 gives 605). Now 1065-605 = 460 makes perfect sense which aligns with the validation part of the series which is our past data.

Please feel free to let me know if anything seems incorrect about it. Thanks again for your input otherwise I wouldn’t have thought in this way.

You’re welcome! happy to help :raised_hands:

That’s correct! Great job working through the math, and thanks for sharing your understanding

1 Like