As I go through the discussion here again, I think everybody agrees that
- probability density is different from probability.
- the formula p(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{\sigma^2}} is the formula for probability density
- we get the probability by integrating p(x) over a region of x, be the range small or large
Some of us may agree that this specialization is not for teaching probability, but it cannot stop us from applying it.
-
I think that probability density p(x) is a good PROXY for Probability:
P(x) := P( x - \frac{\delta}{2} < x < x + \frac{\delta}{2}) \approx p(x)\delta \propto p(x) given a very small fixed range \delta.
so that the higher the probability density, the higher the probability within a region of that fixed size. -
However, some of us may want to pose the check differently as
P(\text{event of observing as extreme as }x') = P( -\infty < x < x') when x' is negative or,
P(\text{event of observing as extreme as }x') = P( x' < x < \infty) when x' is positive.
I believe we will agree that both can work in practice the same way given the right \epsilon value adjusted according to whichever approach adopted. Definitely, practically speaking, approach 1 would be preferred because it is computationally cheaper without doing any integration. This is why I support the implementation of using directly the probability density function as did by the lecture and in the assignment to do anomalous detection.
The approach 2 above is pretty frequentist approach, whereas if we want to adopt the Bayesian approach and in order to construct the likelihood, I guess we would want to know some anomalous samples which we assumed we didn’t have in the training set for this course because we are discussing it under the context of unsupervised learning.