I am confused how we are finding probabilities of all the different features. Correct me if I am wrong but from my understanding probability in a continuous distribution on any particular point is 0 and we have to find probabilities in certain ranges in continuous distributions. Can someone please explain this and also what the formula provided in the course(p(x)=1/2pi(e^(xu)sq/2*variance)) is doing?
Thanks in advance
We’re not using the probability, we’re using the distribution.
The probability is the outcome when you integrate mathematically over the probability density function (PDF). As you pointed out, taking only a point of this PDF does not really make sense since you cannot really interpret a single point here in a reasonable way (besides to be a literally infinitesimal small interval… as you pointed out, too!).
But if you integrate over a defined interval of the PDF you can derive a tangible probability, which you can interpret, e.g. in the example of an ROC analysis or when evaluating false positives etc., see also:

https://upload.wikimedia.org/wikipedia/commons/4/4f/ROC_curves.svg

Anomaly Detection: How to improve?  #2 by Christian_Simonis
Please let me know if this helps, @Numair!
Best regards
Christian
I am not completely sure about the formula you wrote above. But I assume that you mean this one:
It specifies the PDF of the popular (bellshaped) normal distribution, also called Gaussian distribution, using the standard deviation σ and the mean μ.
(source)
see also this thread.
Please let me know if anything is open from your end, @Numair.
Best
Christian
Hello @Numair,
Just to add to the existing answers, it is true that at any single point, that formula (see the one shared in @Christian_Simonis for the complete form) does not give you a probability value, but a probably density value.
Consider the case where we actually care about the probability value, which is when we determine whether a feature value is beyond the anomaly threshold. If we are going serious about finding the probability value, we do integration over the probability density by integrating up the density from the threshold to infinity. However, knowing that there is a strictly decreasing relation between the probability density at a point x and the probability from x to \infty because the farther away x is beyond the threshold, the smaller the probability, it is sufficient for us to use the probability density to make that judgement of whether the feature is in the anomaly range.
It is of course very good and intuitive if we use the probability value for the judgement, but it is Okay too to use the probability density value for the judegement, because there is that strictly decreasing relationship between them.
Cheers,
Raymond