Help on real-life project

Daniel_Cho · October 15, 2021, 10:03pm

Hi!

This might not be the proper course to upload my question, but I can’t seem to find a better place. Hopefully someone can help me out.

So, I’m working on a project for a big Oil & Gas company where the goal is to predict failures in a critical machine used for oil extraction. I have ~15 features and I managed to label (0/1) the historic failures of this machine (timeseries data of about 2 years collected every 10 minutes).

Now, one of the problems I’m facing is that there is a 1-month period of Null values, meaning that all ~15 features were lost for a continued 1-month period of time (later I discovered this was due to a social revolt that obligated the company to shut everything down). I applied Linear Regression to cover other randomly distributed Null values, but it doesn’t seem right to me to do the same for this 1-month period of continued Null values. So, I was wondering if anyone could shed some light for me on how to approach these Null values.

A statistician friend of mine recommended me to separate my original dataset in 2 different datasets, 1 for before the Null values show up, and 1 for after the Null values are gone. But then I end up with 2 different models, where each model might have learnt different types of failures (failures might happen for n reasons).

Hope someone finds this interesting enough to give me a hand, but anyway, thank you for reading me!

Daniel Cho

P.S. I also heard that the COX Regression method would be a good approach for predictive maintenance projects. Does anyone know where can I find courses like this one that cover COX Regression?

SomeshChatterjee · October 16, 2021, 4:01pm

Hi Daniel,

Just curious, following the advice of your statistician friend, you have 2 different models. I am assuming then you merged the output from the two, if either of the model specify some failure, you predict that to be a probable failure. Or. alternatively, learn another ML classifier/regressor which takes in the input from the output of the 2 different models and gives a final prediction (ensemble)
What kind of errors were you seeing with this?

Also, are you using some kind of sliding window to calculate the time-dependent features, like to give a simple example, the variance in a month doesn’t go beyond a specific range? If yes, you might use the entire dataset together, you’ll just have to discard certain windows in between (assuming we can make this simplification and keep the windows independent).

paulinpaloalto · October 16, 2021, 5:18pm

Isn’t the simplest solution just to pretend that the “month of no data” doesn’t exist? You don’t have any data, so what is the point of even considering it? And from what you say it sounds like the machinery was not operating during that missing period, so it was not accumulating wear and tear that you are missing in your data.

Daniel_Cho · October 22, 2021, 9:41pm

Thank you Somesh! I haven’t tried an ensemble model, but I’ll definitely try.
About using some kind of sliding window, I’m not familiar with that method but I’ll look into it and try it myself.

Thank you for reading me out and giving me your great advice!

Daniel_Cho · October 22, 2021, 9:46pm

You are right! Since the models I’ve been using until now are time-independent, it wouldn’t cause a big issue to just delete that month of no data.
However, if I decided to go with a time-dependent model such as Cox Regression, I believe I would have to do something like Somesh suggested. Do you agree?

Anyway, thank you very much for your time and advice!

paulinpaloalto · October 22, 2021, 11:47pm

Hi, Daniel.

Sorry, but I have no knowledge of Cox Regression, so I am not able to add anything useful on this question.

Regards,
Paul

Topic		Replies	Views
Question: week 1, steps of an ML project -2.30 min Introduction to Machine Learning in Production	2	586	May 17, 2021
How to Preprocessing Time Series Data for Supervised Machine Learning AI Discussions	4	76	November 27, 2023
A few theoretical/practical questions related to structuring projects Structuring Machine Learning Projects week-3	14	200	March 29, 2024
Real world model with data quality Sequences, Time Series and Prediction week-4	1	518	August 9, 2022
Tips for Machine Learning with Linear Regression in python AI Discussions introductions , project	6	78	September 28, 2024

Help on real-life project

Related topics