C2W2_Assignment Error

Receiving the following error when executing Exercise 7:

TypeError: num_buckets must be an int, got <class ‘dict’>.

I see that the code shows FEATURE_BUCKET_COUNT = {‘rain_1h’: 3} in 2.5 - Transform.

Even when I change the code to FEATURE_BUCKET_COUNT = 3, the error is still thrown. Please advise how this can be fixed.

1 Like

Hi Richard! I advise that you revert your changes and make FEATURE_BUCKET_COUNT a dictionary. That will make the code more flexible. Also, python imports modules only once so you will still get that error unless you restart the notebook or use a reloading method like importlib.reload().

What you need to revise is the code in preprocessing_fn(). Remember that you are looping over the keys of _BUCKET_FEATURE_KEYS. The current key is assigned to a local variable named key. You can use this same key to grab the corresponding value in the FEATURE_BUCKET_COUNT dictionary. This pattern will be more reusable in your own projects in case you have several features which you want to have different bucket counts.

Hope this helps!

# Bucketize the feature
for key in _BUCKET_FEATURE_KEYS:
    outputs[_transformed_name(key)] = tft.bucketize(_fill_in_missing(inputs[key]), _FEATURE_BUCKET_COUNT)

Here is the code that I have for Buketize. I there something I’m doing wrong?

Hi @chris.favila ,

About _BUCKET_FEATURE_KEYS & FEATURE_BUCKET_COUNT, technically I can understand how they work by reading the official document, but I can’t understand well the ideas behind then while applying in the case.

I can’t understand what kind of features should be _BUCKET_FEATURE_KEYS, and how to decide the exact FEATURE_BUCKET_COUNT.

Could you please explain the ideas behind them ?
Thanks a lot~

Hi Richard! As you mentioned earlier, this line will throw an error because _FEATURE_BUCKET_COUNT is a dictionary. What you want is to extract the int value associated with the current key. To give you a hint, the inputs variable you pass in as the first argument is also a dictionary but you are not getting any error from it. You are able to extract the Tensor associated with the current key successfully. Also to note, you can drop the _fill_in_missing() here if it throws an error. I think inputs[key] is already a dense tensor so no need to call that function.

Hi! These are ultimately up to the ML engineer. In this case, the engineer decided that he/she doesn’t want the model to accept the rain rate values as is, and wants to divide it into buckets. The model will learn how the different ranges of values affect the final output. The number of buckets is also up to you and you can base it on your domain expertise. In this simple case, there are only 3 buckets which is like saying: light, medium, and heavy rain should affect the traffic volume differently. In other words, it’s hypothesizing that slight changes in the rain rate (e.g. 0.2mm and 0.5mm rain) shouldn’t affect the traffic volume a lot so it’s safe to put ranges in the same bucket. You can add more buckets if you (or a domain expert) think it will lead to better predictions. Just don’t put too many since that defeats the purpose of bucketizing. For more info, you can read on here. Hope this helps!

*Note: For next time, you may want to post queries like this as a separate topic. That will get more mentors and learners to weigh in. Also please avoid tagging mentors and staff. That will discourage other mentors and learners from answering your query. You might also wait longer in case the person you tagged is offline or busy. Hope you understand. Thank you!

1 Like

Thanks for the explicit explanation.
And really appreciate the tips on posting.