I have a lot of doubts regarding week 1’s assignment.
What is the intuition behind X|C_i mentioned in the line, “…we use \mathbf P(x \mid C_i) generically to indicate the distribution of X|C_i…”? Which distribution is being referred to exactly?
I literally did not understand anything about this paragraph:
“…The probabilities \mathbf P(C_i) are called the class prior probabilities, and they denote how likely a random sample from X (without knowing any of its attributes) is to belong to each class. This value is usually not known and can be estimated from the training set by computing the proportion of each class in the training set. If the training set is too small, it is common to assume that each class is equally likely, i.e., \mathbf P(C_1) = \mathbf P(C_2) = \ldots = \mathbf P(C_m), thus only maximizing \mathbf P(x \mid C_i) remains…”
And neither this:
The probabilities \mathbf P(x_k \mid C_i) can be estimated from the training data. The computation of \mathbf P(x_k \mid C_i) depends on whether x_k is categorical or not.
If x_k is categorical, then \mathbf P(x_k \mid C_i) is the number of samples in C_i that have attribute x_k divided by the number of samples in class C_i. WHY?
If x_k is continuous-valued or discrete-valued, we need to make an assumption about its distribution and estimate its parameters using the training data. For instance, if x_k is continuous-valued, we can assume that \mathbf P(x_k \mid C_i) follows a Gaussian distribution with parameters \mu_{C_i} and \sigma_{C_i}. Therefore, we need to estimate \mu and \sigma from the training data, and then \mathbf P(x_k \mid C_i) = \text{PDF}_{\text{gaussian}}(x_k, \mu_{C_i}, \sigma_{C_i}). DIDN’T GET IT!!
I would really be grateful if someone kindly helps me build an intuition. The lecture videos were a smooth and easy ride. Completely lost, however, in the assignment
All of them. The sentence tries to explain the notation, not a specific distribution.
It’s all background information that isn’t very important here. It seems the programming assignment was developed by a different person than the video lectures, and they decided to duplicate a lot of boilerplate technical terminology that wasn’t in the lecture and isn’t pertinent to this assignment.
It’s a definition. It’s the proportion of the examples that are members of a specific class.
Basically, it says that a gaussian distribution is characterized by its mean and standard deviation, and these can be different for each class.
Thanks a lot for spending some time going through my post and clearing my doubts. Would you mind taking some more time explaining the summary of section 2 of the assignment. I mean to say that I was able to solve each section of the assignment without much pain really, but I still did not get the big picture, so to speak (what each individual segment of the assignment is doing).
Here’s an overview of a portion of the assignment (section 2) and I just want to know the reason behind each of these steps:
Generating the dataset
PDF for distributions (PMF for Binomial)
Estimated parameters
How each of these steps help at all develop a naive bayes classifier?
I hope I’m able to explain my issue clearly. SORRY TO TAKE YOUR TIME!
I agree, this assignment is a bit murky. I’m not entirely sure why the Naive Bayes classifier was thrown into this course. It’s rather heavy going for an introductory course.
Each of the features in this dog classifier has a different type of distribution. So you need different functions to estimate their probabilities.