C3_W1 Exercise 5

Ok I give up. Spent 3 hours trying to get the dictionary set up with estimated params. I know the solution will be very simple and I’m very close, but I can’t get it to work and I need assistance.

Thanks.

1 Like

How are you stuck? I might be able to help.

Tell me what are you stuck on so I can give you some pointers.

Have a look at this discussion (link below). According to your description, I think you have the same doubts we discussed there, specifically the part of the dictionary.

Hi, thanks for the response.

Here’s what I have for the params_dict (this is just for first entry, 0, in the dictionary):
‘bark_days’: params_binomial(n, p),
‘ear_head_ratio’: params_uniform(a, b),
‘height’: params_gaussian(m, s),
‘weight’: params_gaussian(m, s)},

This outputs the same values as the breed_params object, not the values of estimate_binomial (or gussian, etc)_params.

The trouble seems to be that I can’t locate the variable that has the actual data for each of the features - bark_days, etc - to add to the params of the params_binomial entries in the dictionary. I’ve tried breeds, df_train, etc…nothing outputs the correct values

I don’t want to be pedantic, but I also spent hours on Exercise 5 and had to learn many basic but critical things to get my program to work.

I discovered in hindsight that a lot of my problems were rooted in processing the data with the correct data structures.

Exercise 5 requires you to create and operate on a 2 layer dictionary with the inner dictionary’s content being the dataclass from Section 2.1 and the dictionary breeds_params.

So I like to ask you some questions that will save you A LOT of time. Because I had to learn them the hard way. There is a good chance your bugs are actually the result of a series of incompatible data structures.

Before I jump into details, I want to do a basic Python knowledge check:

  1. Do you know how to generate and operate on (read, write data, iterate) a dataframe ?
  2. Do you know how to generate and operate (read, write data, iterate) on a 2 layer dictionary ?
  3. Do you know how to generate and operate (read, write data, iterate)on a 2 layer dictionary with the 2nd layer’s dictionary’s data content being a dataclass ?
  4. Are you comfortable with creating 2 layer loops?

If you have challenges in any of the 4 questions, we need to fix those bugs in the code first before moving on to the code’s logic. Otherwise, it will be a total waste of your efforts.

And if you are okay with all 4 questions, tell me how you are looping through the df_train, and Features data? (BTW, Features is of type ‘list’).

The logic you will need to create for Exercise 5 is very flexible, there are many ways to implement it (it’s entirely up to you), but to share with you what I did:

I created 2 loops (outer and inner loop):

   for breed in df_train['breed'].unique():
            breed_data = df_train[df_train['breed'] == breed]
    
     #Line above creates a new DataFrame called breed_data that contains only the rows 
    #from df_train where the 'breed' column matches the current unique value in the 
    #'breed' variable. 
    #It uses boolean indexing to filter the DataFrame and retrieve only the rows that 
   #satisfy the condition.

    params_dict = {}
    
    #.columns: attribute of a dataframe returns an Index object containing the column 
    #names. 
    #So, breed_data.columns represent all the column names of the breed_data 
    #dataframe.
   
           for feature in breed_data.columns:
                  if feature in features:
                         data_array = breed_data[feature].values

The outer loop allows me to loop through all data found in df_Train based on a specific breed (1 breed at a time). ‘Breed’ value here is (0,1,2).

This is followed by the inner loop which allows me to iterate over the feature-specific data given the current breed that is being iterated upon.

Excellent, thank you!

  1. No, I’ve never used dataframes before, but I thought that’s what we generated in 2.1.

  2. No, I’ve never generated a 2 layer dictionary before, but now I know how and believe that syntactically, my params dict is correct even though the data is still the static data from original estimate_params from 2.3.

  3. No, I haven’t operated a dictionary whose 2nd layer is a dataclass, but I do understand that layer 1 is the index of breeds (0, 1, 2) and the other layer are the dataclasses we created in 2.1.

  4. I’ve never created a 2 layer loop.

I’ll dig in a bit more with dataframes (I’ve never used them before). If I go piece by piece, I’m sure I’ll get the end.

My Python skills are quite limited. I’ve done a lot of coding with JS and Node.js, I successfully managed to get through the Linear Algebra and Calculus courses learning numPy as I went, but Exercise 5 in this assignment is a whole other level of complexity. I do understand the basic goals are simple: create 2 dictionaries, loop through the dataframe, etc., and I am sure that when I get to the solution it will be simple and understandable as coding solutions usually are, but this Exercise has really pushed me to my limit. It seems that many people are struggling with this exercise so DeepLearning.ai may want to modify it or give a few more hints or at least links to documentation to help us along.

2 Likes

I am on the same situation

We’ll get there! I’ll keep posting as I solve each bit.

Hi guys, I recommended a discussion above, a topic created by DebbieA, that you should have a look carefully. There I detailed most of the ex.5.

I’ll try to add a complement about data frames and the first dict part which I noticed it is the core of your problems.

  1. Let’s understand the “df_breed = None” part. You want to pick a slice from the data frame, the lines that match the actual breed (remember you’re looping over breeds [0,1,2]) but just the columns with the features. So you can put a condition to call the data frame for a specific breed_i this way:

    df_breed = df[df[breed] == breed_i][features]

    P.S. I used “breed_i” but you must match the iterator of the FOR LOOP (the loop for the breeds) you coded in the previous step, which can be x, n, breed, etc.

  2. Now you’ll calculate the proportions of each breed_i. The slice you picked above, “df_breed” has a number of lines corresponding for only one breed. The data frame “df” has a number of lines corresponding to all breeds. So, just divide number of lines of “df_breed” by the number of lines of “df”. In Python you can do this count, for each breed:

    probs_dict[breed] = len(df_breed) / len(df)

    To make it more elegant, you can round as suggested by the instructions. Just use “round(len(df_breed) / len (df), 3)”. The number 3 is the number of decimal digits.

Check the rest of the hints I posted on the recommend discussion:

1 Like

Hi thank you…I read that thread before I posted and it didn’t make sense to me at that point. It makes a lot more sense now, thank you.

Circling back on our discussion.

In case the group hasn’t figured this out, here is an example of how to create a 2 layer dictionary with the 2nd layer dictionary’s content being a data class (you will need to implement some version of this in your code). I have included a generic example to illustrate how it is implemented, and how you can operate on it (read, write, iterate).

You will need to make sure you have already created a similar python object (2 dictionaries with a data class as content), correctly before you will be able to complete exercise 5,6.

from dataclasses import dataclass

Step 1: Define your data class (This was already done for you by the assignment, but you will need to write values to each of the keys in the data class).

@dataclass
class InnerData:
height: float
weight: float
bark_days: int
ear_to_head_ratio: float

Step 3: Create instances of the data class

inner_data1 = InnerData(10.5, 20.2, 5, 0.8)
inner_data2 = InnerData(12.2, 18.9, 7, 0.9)
inner_data3 = InnerData(9.8, 21.7, 3, 0.7)

Step 4: Create the inner dictionary

inner_dict = {
‘height’: inner_data1,
‘weight’: inner_data2,
‘bark_days’: inner_data3,
‘ear_to_head_ratio’: inner_data1
}

Step 5: Create the outer dictionary

outer_dict = {
0: inner_dict,
1: inner_dict,
2: inner_dict
}

InnerData is the data class representing the content of the inner dictionary. It has four attributes: height, weight, bark_days, and ear_to_head_ratio. Three instances of the data class, inner_data1, inner_data2, and inner_data3, are created with different attribute values.

The inner dictionary inner_dict is created, where the keys are strings (‘height’, ‘weight’, ‘bark_days’, ‘ear_to_head_ratio’) and the values are the instances of the data class.

Finally, the outer dictionary outer_dict is created, where the keys are integers (0, 1, 2), and the values are the inner dictionaries. Note that in this example, the same inner dictionary inner_dict is used for all keys in the outer dictionary. If you want separate inner dictionaries for each key, you would need to create separate instances for each inner dictionary.

from dataclasses import dataclass

# Step 1: Define your data class
@dataclass
class InnerData:
    height: float
    weight: float
    bark_days: int
    ear_to_head_ratio: float

# Step 3: Create instances of the data class
inner_data1 = InnerData(10.5, 20.2, 5, 0.8)
inner_data2 = InnerData(12.2, 18.9, 7, 0.9)
inner_data3 = InnerData(9.8, 21.7, 3, 0.7)

# Step 4: Create the inner dictionary
inner_dict = {
    'height': inner_data1,
    'weight': inner_data2,
    'bark_days': inner_data3,
    'ear_to_head_ratio': inner_data1
}

# Step 5: Create the outer dictionary
outer_dict = {
    0: inner_dict,
    1: inner_dict,
    2: inner_dict
}

How to use loops to create the 2 layer dictionary with data class ?

for key_outer, inner_dict in combined_dict.items():
    for key_inner, data in inner_dict.items():
        # Update the attributes of the data class
        data.height = 15.0
        data.weight = 25.0
        data.bark_days = 8
        data.ear_to_head_ratio = 0.5

In this code, the outer loop iterates over the key-value pairs in the combined_dict, and the inner loop iterates over the key-value pairs in each inner dictionary.

Within the nested loops, you can access the data class object using the variable data and update its attributes according to your requirements. In this example, the attributes height, weight, bark_days, and ear_to_head_ratio of the data class are assigned new values.

Here is what a 2 layer dictionary with a dataclass as content looks like combined together:

combined_dict = {

0: {
    'height': InnerData(height=10.5, weight=20.2, bark_days=5, ear_to_head_ratio=0.8),
    'weight': InnerData(height=12.2, weight=18.9, bark_days=7, ear_to_head_ratio=0.9),
    'bark_days': InnerData(height=9.8, weight=21.7, bark_days=3, ear_to_head_ratio=0.7),
    'ear_to_head_ratio': InnerData(height=10.5, weight=20.2, bark_days=5, ear_to_head_ratio=0.8)
},

1: {
    'height': InnerData(height=10.5, weight=20.2, bark_days=5, ear_to_head_ratio=0.8),
    'weight': InnerData(height=12.2, weight=18.9, bark_days=7, ear_to_head_ratio=0.9),
    'bark_days': InnerData(height=9.8, weight=21.7, bark_days=3, ear_to_head_ratio=0.7),
    'ear_to_head_ratio': InnerData(height=10.5, weight=20.2, bark_days=5, ear_to_head_ratio=0.8)
},

2: {
    'height': InnerData(height=10.5, weight=20.2, bark_days=5, ear_to_head_ratio=0.8),
    'weight': InnerData(height=12.2, weight=18.9, bark_days=7, ear_to_head_ratio=0.9),
    'bark_days': InnerData(height=9.8, weight=21.7, bark_days=3, ear_to_head_ratio=0.7),
    'ear_to_head_ratio': InnerData(height=10.5, weight=20.2, bark_days=5, ear_to_head_ratio=0.8)
} }

How to access values in the combined data structure, combined_dict ?

Example :

weight_value = combined_dict[0]['height'].weight

This code retrieves the inner dictionary associated with key 0 using combined_dict[0], and then accesses the ‘height’ key using ['height']. Finally, it retrieves the weight attribute value using .weight.

In the example provided earlier, the weight value for key 0, height is 20.2.

How to write to combined_dict ?

Example:

combined_dict[2]['bark_days'].bark_days = 10

This code accesses the inner dictionary associated with key 2 using combined_dict[2], then accesses the ‘bark_days’ key using ['bark_days'].

Finally, it updates the bark_days attribute value to 10 by assigning the new value to combined_dict[2]['bark_days'].bark_days.

After executing this code, the value of ‘bark_days’ in the inner dictionary with key 2 will be updated to 10 in the combined_dict dictionary.

How to iterate over every item within combined_dict?

Example Code:

for key_outer, inner_dict in combined_dict.items():
    print(f"Outer Key: {key_outer}")
    for key_inner, value in inner_dict.items():
        print(f"Inner Key: {key_inner}")
        print(f"Value: {value}")
        print("---")

In this code, the outer loop iterates over the key-value pairs in the combined_dict using the items() method. The loop assigns each outer key to the variable key_outer and the corresponding inner dictionary to inner_dict.

Inside the outer loop, the inner loop iterates over the key-value pairs in the inner dictionary using the items() method. The loop assigns each inner key to the variable key_inner and the corresponding value to value.

You can replace the print statements with any desired operations you want to perform on each element of the combined_dict. This example demonstrates how to access the keys and values, but you can modify the code according to your specific needs.

2 Likes

Wow, thank you Kevin. I moved on to Week 2 over the weekend to keep the momentum going but will return to this assignment in a couple weeks and will definitely implement some of these ideas.

Happy to help. It was no biggie. Week 2 assignment was much easier.

I tried this but when I run this code it doesn’t recognize feature. I have programmed with python before but never used pandas.

1 Like