Help on real-life project


This might not be the proper course to upload my question, but I can’t seem to find a better place. Hopefully someone can help me out.

So, I’m working on a project for a big Oil & Gas company where the goal is to predict failures in a critical machine used for oil extraction. I have ~15 features and I managed to label (0/1) the historic failures of this machine (timeseries data of about 2 years collected every 10 minutes).

Now, one of the problems I’m facing is that there is a 1-month period of Null values, meaning that all ~15 features were lost for a continued 1-month period of time (later I discovered this was due to a social revolt that obligated the company to shut everything down). I applied Linear Regression to cover other randomly distributed Null values, but it doesn’t seem right to me to do the same for this 1-month period of continued Null values. So, I was wondering if anyone could shed some light for me on how to approach these Null values.

A statistician friend of mine recommended me to separate my original dataset in 2 different datasets, 1 for before the Null values show up, and 1 for after the Null values are gone. But then I end up with 2 different models, where each model might have learnt different types of failures (failures might happen for n reasons).

Hope someone finds this interesting enough to give me a hand, but anyway, thank you for reading me!

Daniel Cho

P.S. I also heard that the COX Regression method would be a good approach for predictive maintenance projects. Does anyone know where can I find courses like this one that cover COX Regression?

1 Like

Hi Daniel,

Just curious, following the advice of your statistician friend, you have 2 different models. I am assuming then you merged the output from the two, if either of the model specify some failure, you predict that to be a probable failure. Or. alternatively, learn another ML classifier/regressor which takes in the input from the output of the 2 different models and gives a final prediction (ensemble)
What kind of errors were you seeing with this?

Also, are you using some kind of sliding window to calculate the time-dependent features, like to give a simple example, the variance in a month doesn’t go beyond a specific range? If yes, you might use the entire dataset together, you’ll just have to discard certain windows in between (assuming we can make this simplification and keep the windows independent).

1 Like

Isn’t the simplest solution just to pretend that the “month of no data” doesn’t exist? You don’t have any data, so what is the point of even considering it? And from what you say it sounds like the machinery was not operating during that missing period, so it was not accumulating wear and tear that you are missing in your data.

1 Like

Thank you Somesh! I haven’t tried an ensemble model, but I’ll definitely try.
About using some kind of sliding window, I’m not familiar with that method but I’ll look into it and try it myself.

Thank you for reading me out and giving me your great advice!

1 Like

You are right! Since the models I’ve been using until now are time-independent, it wouldn’t cause a big issue to just delete that month of no data.
However, if I decided to go with a time-dependent model such as Cox Regression, I believe I would have to do something like Somesh suggested. Do you agree?

Anyway, thank you very much for your time and advice!

Hi, Daniel.

Sorry, but I have no knowledge of Cox Regression, so I am not able to add anything useful on this question.