Time Series Linear Regression

I’ve noticed that we talk about linear and multiple linear regression but I have pause when I think this will apply to something like a time series.

In a dataset where linear regression applies and with x and y that is not a time series do we make the assumption x^{(1)} is not related to x^{(4)}? But in a time series with t and y, t^{(1)} may have some impact on t^{(4)}, no? I guess what I am talking about here is time series forecasting.

Questions:

  1. Do we cover this in the ML course?
  2. Are there good resources out there for reading about time series forecasting with ML?

Hi there,

in my opinion a really good course on time series modelling can be found here (a different specialisation): https://de.coursera.org/learn/tensorflow-sequences-time-series-and-prediction

Following assumptions are associated with a linear regression model:

  1. Linearity: The relationship between X and the mean of Y is linear.
  2. Homoscedasticity: The variance of residual is the same for any value of X.
  3. Independence: Observations are independent of each other.
  4. Normality: For any fixed value of X, Y is normally distributed.

Source:
https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R5_Correlation-Regression/R5_Correlation-Regression4.html

In reality my experience is that some assumptions might get violated in practice in some cases but they might be mitigated with best practices (e.g. feature engineering incorporating domain knowledge so that features capture some non-linearity)…

To provide a suggestion for the independence characteristics

not related

you mentioned:
What I can recommend is to understand your features and their distribution and correlation, see also this thread: Machine learning - #4 by Christian_Simonis

Afterwards you can perform a PCA to figure out the principal components and how much of variance they explain. (Also a nonlinear transformation is possible with kernel PCA, but that’s another topic.)

Often the first couples of Principal Components (PCs) explain a high level of variance. PCs represent an orthogonal feature set which is reached with a rotation projection from your initial feature set.

You can use these PCs as features for your linear model, considering the assumptions mentioned above. You can utilise this linear model for forecasting.

Best regards
Christian

1 Like

Hello @jesse,

@Christian_Simonis shared a very good course that I also took it myself. Here I want to quickly connect the dots from how to convert a time series dataset that carries this correlation

to a dataset that looks like the following so you may apply skills you have learnt:

All I am going to say is actually covered in the course that @Christian_Simonis shared.

Like we have feature engineering (FE) in our MLS C1, we can do the same for time series, by extracting patterns over a certain time window, in order words we are not looking at one timestamp at a time to produce one x^{(i)}, but instead a period of timestamps. The course has a video in W1 called “common patterns in time series” that we will see names such as seasonality (periodic high-low fluctuation) and autocorrelation (future depends on the past), and the job of FE is to, for example, build a feature for a time window to represent which level we are at among the range of fluctuation, given that such seasonality exists and our time window isn’t too short or too long to speak about such seasonality. Such feature can just be a mean value of the time window’s data, or the last value of the time window’s data, or the average of the last five values. But the first question is, is there really a seaonality, so it is also a skills to learn how to find it.

Given this understanding of the need of pattern hunting, besides the course, which again is a very good starter and has exercises, I can share some names that are related to finding certain pattern goals, so you may google about them along your learning journey.

Seasonality – Fourier transform
Autocorrelation – ARIMA
Overall trend – Linear/Polynomial regression over time

There are also many skills that are domain (seismic, audio, etc.) specific which can be great additions.

Lastly, sometimes people will just feed a windowed time series into a Neural Network in the hope that the NN/RNN will find out the patterns automatically so you don’t do it yourself. I can’t tell you which is a better way but I always prefer to look at some data myself.

I hope this will give you some ideas! Cheers!

1 Like

@Christian_Simonis @rmwkwok I don’t want to crowd this thread with “thank yous” but thanks to you both! Great material here for me to pursue.

2 Likes

Happy to hear that :slight_smile: Good luck for your learning journey!

1 Like

Hello @jesse

I would like to add a point here based on your above query:

To be able to give a time series treatment to a dataset, there has to be a temporal reference provided in the data - We are not at liberty to make an assumption based on its relative position in the dataset. This means that we cannot associate the reference of time to x^{(1)} and x^{(4)}.

Once the time reference is made available in the data, we can then find out the relationships at various lags like y^{(1)} and y^{(4)} OR y^{(1)} and y^{(6)} etc - From here on we apply various techniques such as autocorrelation, lag, seasonality etc - @rmwkwok has already alluded to these.

FYI - Autoregressive models such as ARMA, ARIMA etc uses the Linear Regression model with the lagged values as the features.

1 Like