Hi guys, I recommended a discussion above, a topic created by DebbieA, that you should have a look carefully. There I detailed most of the ex.5.
I’ll try to add a complement about data frames and the first dict part which I noticed it is the core of your problems.
-
Let’s understand the “df_breed = None” part. You want to pick a slice from the data frame, the lines that match the actual breed (remember you’re looping over breeds [0,1,2]) but just the columns with the features. So you can put a condition to call the data frame for a specific breed_i this way:
df_breed = df[df[breed] == breed_i][features]
P.S. I used “breed_i” but you must match the iterator of the FOR LOOP (the loop for the breeds) you coded in the previous step, which can be x, n, breed, etc.
-
Now you’ll calculate the proportions of each breed_i. The slice you picked above, “df_breed” has a number of lines corresponding for only one breed. The data frame “df” has a number of lines corresponding to all breeds. So, just divide number of lines of “df_breed” by the number of lines of “df”. In Python you can do this count, for each breed:
probs_dict[breed] = len(df_breed) / len(df)
To make it more elegant, you can round as suggested by the instructions. Just use “round(len(df_breed) / len (df), 3)”. The number 3 is the number of decimal digits.
Check the rest of the hints I posted on the recommend discussion: