Hello again, Prof. Serrano!
In the exercise 6 of the W1 Programming Assignment, we used the pdf_gaussian to calculate P(x_k / C_i).
But the PDF is the probability density, not the real probability. In the lectures we learned that the probability for a specific value of a continuous distribution is zero. We should’ve calculated probs of intervals instead.
This is confirmed if we sum the “probs” of all x_k for C=0, for example, and find they DON’T sum up to 1.
Could you confirm if it is actually a mistake or it’s a simplification that works anyway (but wasn’t explained how)?
Or maybe I didn’t understand the topic…
P.S. in case it is a simplification that works, maybe you should include a paragraph in the instructions to clarify.
@magdalena.bouza can help with this
Hi @Joao_Carlos_Lima_Sel. This is an excellent question!
The confusion comes from an abuse of notation, when saying \mathbf P(x|C_i). When x is a continuous random variable, it is true that the probability of a particular value is zero, but for the purpose of the classification problem you still want to measure how likely was the sample you saw given each of the classes C_i. Getting the probability of an interval isn’t really correct, because you saw exactly that sample. The solution for this is to replace the probability for the probability density function.
I think you will gain more clarity after reviewing the W2L2 content. Please let me know if this is clear enough or you need any more clarification, which I will happily provide!
I’m still feeling that something is missing to make total sense. It looks like a convenience to make things work, but lacking math formality, maybe…
But I’ll try to read from other sources to get insights.