Mu, var values from X_train or X_val?

When you go though ‘C3_W1_Anomaly_Detection’ you notice that the mean and variance is calculated solely from X_train. You then apply those values to X_val to calculate the best epsilon value. Should the mean and variance not be calculated solely from X_val?

You are blinded as to how many possible anomalies are in X_train. What if there are many anomalies? Wouldn’t many anomalies bias the mean and variance values towards corruption? Thus the covariance and reverse covariance dimensionality shape would be impure (more bulbous). Applying those value to X_val would give an incorrect epsilon value. When you then apply that incorrect epsilon value, from X_val, to X_train you are creating possible false negatives, only you would not know this.

As you know the true positive state of X_val would you not use that set to calculate the mean and variance and therefore epsilon? You can then apply epsilon to the X_train set and be more confident in the outcomes?

There is obviously a reason why you do not take the mean and variance solely from X_val. I just do not know why.

No. The validation set isn’t used for training.

The training set is assumed to be large enough that it contains a suitable number of anomalies.

I thanks for the reply.

Can you see my point? What if the X_train set contains a lot of anomalies?

Then mean and variance will be skewed towards the anomalies.

Hello @Captain_Riggs,

Then this may either be a problem or not a problem.

This may not be a problem if your new data can still be identified correctly with your current model.

Or, this is a data shifting problem - the data we collected for training no longer represents the same distribution as the data we need to make prediction for.

In this case, the simplest step would be to train a new model with a representative set of data.

You may consider your current X_val as part of your new X_train. In other words,

X_{train}^{new} = X_{val}^{current} + more

Then you need a new set of validation data X_{val}^{new} to validate your model - afterall we can’t reuse X_{train}^{new} to validate a model built with X_{train}^{new} itself.

Cheers,
Raymond

Hmm. I see what you are saying.

Thanks

Should have posted here.

Hi Raymond.

I see what you are saying and that makes sense.

Cheers