Hello, I have few simple questions on anomaly detection algorithm.

First of all, the p(x) that is describe in the lecture, I believe it is not exactly “probability,” right? It is the probability density which might exceed 1 if the variance is extremely small. Or am I getting something wrong?

Secondly, what should do if I have a categorical variable that looks really good in detecting anomaly? I can’t fit a gaussian distribution on a discrete random variable, right?

I think this moment of the lecture has presented to us the critical idea that each independent feature contributes one probability density. Although they are all later assumed to be Gaussians, we can have a Multinormial if there is a categorical feature or a Binomial for boolean feature. Then, the rest of the algorithm should work the same way.