Questioons on lab 2 anomaly detection

Hi all, couple of questions regarding this topic

  1. how do you compare y_val to a boolean mask from p_val? right now I simply used a for loop like this:
    preds=np.zeros(p_val.shape[0])
    for i in range(p_val.shape[0]):
    if p_val[i]<epsilon:
    preds[i]=1
    yeah, it’s very stupidly slooow… I know… I could’ve created a mask like this:
    preds=[p_val<epsilon]
    but doing so, I don’t know how to compare y_val to preds because obviously codes like this:
    tp=np.sum((y_val==1) & (preds==True))
    can’t work… it gave ValueError…
    another way I tried is to convert y_val to bool as well,
    y_val2=[y_val==1]
    but doing so, again, I can’t do: tp=np.sum(y_val2==preds) where preds is the bool [p_val<epsilon].

could someone please enlighten me on how to do this correctly, wuthiyt ysubg fir kiio? I already got the stupid for loop to work…

  1. is the mean, variance and epsilon determined per feature? or is it that the mean and variance a per fearture thing, but epsilon determined by product of all the probabilities from each feature generated by the normal distribution formed by the respective mean and variance? because the lab uses just 1 feature… I’m confused…
1 Like

Hi @john_lonewolf,

Good try, but I stopped reading further your (1) at preds=[p_val<epsilon] because I think the square bracket is not needed.

Each feature has one mean and one variance. The joint probability is the product of the probabilities of each of the features. The epsilon is used to compare with the joint probability.

Cheers,
Raymond

Let’s do one by one.

This is a good question. A boolean mask is quite useful. It’s, as you know, a list of “True”, and “False”, and sometimes used to extract data for “True”.
In your case, you created a boolean mask by using “p_val” and “epsilon”. That’s good. Then, the next step is to apply mask to a target vector/array like “target[mask]”. Your trial to get “preds” is nice try, but is missing a target name.

Let’s start with the above.

I tried this:
preds=p_val.copy()
preds[preds < epsilon]=1
preds[preds >= epsilon]=0
by right, after running the above lines, preds would be converted to an array of 1 and 0 depending on the probs, which makes calculating the tp, fp, fn easy using this:
tp=np.sum((y_val==1) & (preds==1))

but the strange thing is, after preds[preds>=epsilon]=0, basically I get an array of 0s… every iteration of the loop always result in an array of 0 no matter what epsilon is… I made a copy of p_val so that next iteration, preds can be initialized to the original valuies of p_val so that the setting of the 1 and 0 can work properly… what am I mising here?

finally solved it…
preds=p_val.copy()
preds[preds<epsilon]=1
preds[preds<1]=0

this will convert prds into an array of 1s and 0s similar to y_val and thus tpo, fp, fn can be calculated easilt. :slight_smile:

1 Like

Great ! Thank you for letting us know.

In modern programming, for-loop should be avoided as much as possible due to performance issues (including both parallelization and vectorization view points). Nice try !