Best use of ML to verify airplane positions

Hi, I am a relatively new member to the community and not sure if I am posting this in the correct section.

I have just finished the Machine Learning Specialization and wanted to get some input from the community on what would possibly be the best ML method for the following scenario:

You are given a data feed of airplane trajectories that include a series of singular points, each point contains data regarding the altitude, heading, position, velocity, and timestamp of recorded position in flight (time each point is recorded is random). Each trajectory is made up of numerous of these points to make up where the plane is headed. The data feed can be unreliable at times and sometimes throughout the route, it can give bad data. Such as reporting 0 as velocity mid-flight, during takeoff, and landing, sometimes occurring a lot in one trajectory, and other anomalic factors.

What would be the best way to flag and remove these anomalic factors and then to a) possibly try to reconstruct an appropriate trajectory using data that makes sense or b) remove the anomalic points and see if the remaining points make the original trajectory make sense?

My first thought was to use anomaly detection and see what and where certain speeds should be at, at any given time. Such as increasing speeds at takeoff, same speeds at cruising altitude, and decreasing speed at landing. This same concept could be applied to heading changes, asking myself the question, does a heading change in this amount of given time make sense? Same could go for altitude and velocity changes.

There would have to be an additional step after anomaly detection because once these points are found, do I remove it and see if the trajectory makes sense again or should I try to reconstruct the trajectory? Perhaps a supervised model trained off of previous good trajectories that would also reach edge cases, could even edit the numbers myself to create more examples.

Any ideas or is this even the correct thought process?

Hello @Jacob_Cuomo, my two cents:

  1. we care about trajectory, we care about positions, so I would want to first identify what is my target precision, then pick a representative batch of samples, analyze it to find out what is the hardest part to get to such target. For example, if a bad position reading always says something absolutely non-sense, then it can be very easy to detect.

  2. once I know the hardest part, I should also know quantitatively, at least for the analyzed batch of samples, the error of interest in those readings, and then perhaps that’s the part of error I would want to model

  3. still in the process of analyzing that batch of samples, I would also try to figure out the use of each feature I have - perhaps velocity might be found to be less useful most of the time but heading is more important. Perhaps I should engineer some features such as time-to-landing, or maybe local time, or change of measured (and smoothed) quantities.

  4. for me to imagine what my model can do after analyzing and preprocessing my data, a reasonable product of my model could be a (part of the) world map that speaks about the probability of heading at different positions (by months/seasons). Another reasonable product would be a (part of the) world map that speaks about the probability of having an airplane at different positions (by months/seasons). Or perhaps they can be combined into one map. Another model possible would be, as you said, an anomaly detection model like in MLS course 3, that speaks about the joint probability of some of the (engineered) features

  5. The precision requirement of my model can be set based on the project goal, the analysis findings but constrained by the quality and quantity of data I have. For example, what if a record can only be predictive into 10 seconds in the future, but the gap between two records is 20 minutes?

From your post, I feel that you are approaching the problem top-down and some physics and flying practice were running in your mind to govern what could/couldn’t be possible, so I am trying to contribute from another direction and I hope this can give you some different ideas. Another reason for my approach is that, altitude, heading, position, velocity, and timestamp alone doesn’t talk too much about the atmosphere around. Although experts may be able to reason about the atmosphere from just those and inevitably their experience, those reasonings may hard to be quantified or quantified at the scale of the amount of data you have. However, if you are able to quantify those expertise, it can be an useful piece of information, but will need to think about how to use them.

The analyzed samples can/should be part of your evaluation set.

Cheers,
Raymond

1 Like

Totally agree. Chance is that a group of airplanes are found to deviate from expected trajectory on the same day because they tried to avoid a weather system.