C2W2, Execrises 6 and 7 questions

Hi,
I have 2 questions

Question 1
When I make a transformation for _VOCAB_FEATURE_KEYS, I get a notification that vocabulary() got an unexpected keyword argument 'num_oov'. And it’s logically because in the function
tft. vocabulary we do not have this attribute.

code:

    for key in _VOCAB_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.vocabulary(inputs[key], 
                                                         top_k=_VOCAB_SIZE, 
                                                         num_oov=_OOV_SIZE
                                                        )

Result:
TypeError: vocabulary() got an unexpected keyword argument 'num_oov'

Question 2
Can you help me to understand my errors:

def preprocessing_fn(inputs):
    """tf.transform's callback function for preprocessing inputs.
    Args:
        inputs: map from feature keys to raw not-yet-transformed features.
    Returns:
        Map from string feature key to transformed feature operations.
    """
    outputs = {}

    ### START CODE HERE
    
    # Scale these features to the z-score.
    for key in _DENSE_FLOAT_FEATURE_KEYS:
        # Scale these features to the z-score.
        outputs[_transformed_name(key)] = tft.scale_to_z_score(inputs[key])
            

    # Scale these feature/s from 0 to 1
    for key in _RANGE_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.scale_to_0_1(inputs[key])
            

    # Transform the strings into indices 
    # hint: use the VOCAB_SIZE and OOV_SIZE to define the top_k and num_oov parameters
    for key in _VOCAB_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.vocabulary(inputs[key], 
                                                         top_k=_VOCAB_SIZE, 
#                                                          num_oov=_OOV_SIZE
                                                        )
            
            
            

    # Bucketize the feature
    for key in _BUCKET_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.bucketize(inputs[key], 
                                                        num_buckets=_FEATURE_BUCKET_COUNT[key], 
                                                        always_return_num_quantiles=False)
            
            

    # Keep inputs as is. No tft function needed.
    for key in _CATEGORICAL_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.compute_and_apply_vocabulary(inputs[key])

        
    # Use `tf.cast` to cast the label key to float32 and fill in the missing values.
    traffic_volume = _fill_in_missing(tf.cast(inputs[_VOLUME_KEY], dtype=tf.float32))
  
    
    # Create a feature that shows if the traffic volume is greater than the mean and cast to an int
    outputs[_transformed_name(_VOLUME_KEY)] = tf.cast(  
        
        # Use `tf.greater` to check if the traffic volume in a row is greater than the mean of the entire traffic volumn column
        tf.greater(traffic_volume, tft.mean(tf.cast(inputs[_VOLUME_KEY], tf.float32))),
        
        tf.int64)                                        

    ### END CODE HERE
    return outputs
# ignore tf warning messages
tf.get_logger().setLevel('ERROR')


### START CODE HERE
# Instantiate the Transform component
transform = Transform(examples=example_gen.outputs['examples'],
                     schema=schema_gen.outputs['schema'],
                     module_file=os.path.abspath(_traffic_transform_module_file))
    
    
    

# Run the component.
# The `enable_cache` flag is disabled in case you need to update your transform module file.
context.run(transform, enable_cache=False)
### END CODE HERE

Result:
ValueError: Feature holiday_xf (Tensor("vocabulary/Placeholder:0", shape=(), dtype=string)) had invalid shape () for FixedLenFeature: must have rank at least 1

Probably, I have some errors in the function preprocessing_fn

1 Like

Hi @ivan_100096 ,
Welcome to the course!

Let me try to help answering your questions.
Question 1:
For this exercise you need to use another function related to vocabularies, see list of functions below (or check the lab feature engineering pipeline):

Question 2:

I believe this error is related to incorrect vocabulary function. After updating the function, please try to run again and see if the error disappeared.
Please note that you should also modify the code for the _CATEGORICAL_FEATURE_KEYS: . As stated above the function #Keep inputs as is. No tft function needed. so the inputs don’t need any transformation.

Hopefully that helps.
Could you please remove the code snippet from your question above, because we should avoid posting code solutions in the forums to respect the Honour Code.

Thanks and best regards,
Maarten

2 Likes

Hi,
Thank you, Maarten
Fixed and it works :slight_smile:

1 Like

@ivan_100096
When I use this block of code:

Bucketize the feature

for key in _BUCKET_FEATURE_KEYS:
    #outputs[_transformed_name(key)] = None 
    #outputs[_transformed_name(key)] = tft.bucketize(inputs[key],
    #                                                num_buckets = _FEATURE_BUCKET_COUNT(key),
    #                                               always_return_num_quantiles = False)
    outputs[_transformed_name(key)] = tft.bucketize(inputs[key],
                                                    num_buckets = _FEATURE_BUCKET_COUNT)

I get an error: TypeError: num_buckets must be an int, got <class ‘dict’>

but when I replace _FEATURE_BUCKET_COUNT with a hard coded integer, it runs. Obviously, that’s not a good fix. Could you please give me a hint? Thank you

Hi Mark,

Welcome to discourse.
I guess the error is related to using parenthesis () in this parameter; _FEATURE_BUCKET_COUNT(key)

hope that helps.

Regards
maarten

Hi All,
I keep constantly getting this error when using tf.bucketize:


TypeError: bucketize() got an unexpected keyword argument 'always_return_num_quantiles.

Can someone pls help in getting the function right?

mjsmid I am getting the same error as above TypeError: bucketize() got an unexpected keyword argument ‘always_return_num_quantiles’