I’m working on exercise 5 and have gotten myself a little stuck. In the function definition for compute_breed_proportions(df)
The code has a comment with a placeholder to calculate the following:
# Compute the probability of each class (breed)
# You can get the number of rows in a dataframe by using len(dataframe)
prob_class = None/None
I believe the division that is supposed to be performed is:
prob_class = the number of samples in Ci that have xk / total number of samples in Ci
In the dataframe we’re working with what is xk? Is it the height for that particular breed?
Would the total number of samples in Ci == to the number of samples where Ci has that feature (like height)?
I keep thinking I’m just going to end up with
prob_class = len(df_breed) / len(df_breed) == 1
for that and my spidey sense is tingling, lol
I think I’ve got a point of clarification here that could help if someone else gets hung up on this. Exercise 5 is not looking for you to just figure out Ci. Don’t worry about xk yet as that is handled in Exercise 6.
It’s more about how many samples are there in each Class (0, 1, 2) compared to the total number of samples across all classes.
1 Like
Another way to put it is that we’re computing one of the parts for the overall Bayes algorithm for P(Ci). The other parts to the algorithm are handled further on in the assignment.
Hi @tenaciousjzh!
I believe the division that is supposed to be performed is:
prob_class = the number of samples in Ci that have xk / total number of samples in Ci
This is not correct. What we want here is the value P(C_i), i.e., the probability of a random sample be in the class C_i. We are not yet computing any conditional probability!