C2W2 Ex 6 - Any Help

k_b · December 8, 2021, 7:57pm

Hi,

When I try to run ‘# Test your preprocessing_fn’ , I got 'AttributeError: ‘Tensor’ object has no attribute ‘indices’

My long code:

def preprocessing_fn(inputs):
    """tf.transform's callback function for preprocessing inputs.
    Args:
    inputs: map from feature keys to raw not-yet-transformed features.
    Returns:
    Map from string feature key to transformed feature operations.
    """
    outputs = {}

    ### START CODE HERE
    
    # Scale these features to the z-score.
    for key in _DENSE_FLOAT_FEATURE_KEYS:
        # Scale these features to the z-score.
        outputs[_transformed_name(key)] = tft.scale_to_z_score(inputs[key])
            

    # Scale these feature/s from 0 to 1
    for key in _RANGE_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.scale_to_0_1(inputs[key])
            

    # Transform the strings into indices 
    # hint: use the VOCAB_SIZE and OOV_SIZE to define the top_k and num_oov parameters
    for key in _VOCAB_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.compute_and_apply_vocabulary(
            inputs[key], 
            top_k=_VOCAB_SIZE, 
            num_oov_buckets=_OOV_SIZE)
            
            
            

    # Bucketize the feature
    for key in _BUCKET_FEATURE_KEYS:
        outputs[_transformed_name(key)] = tft.bucketize(
            inputs[key], _FEATURE_BUCKET_COUNT[key])
            

    # Keep as is. No tft function needed.
    for key in _CATEGORICAL_FEATURE_KEYS:
        outputs[_transformed_name(key)] = inputs[key]

        
    # Use `tf.cast` to cast the label key to float32 and fill in the missing values.
    traffic_volume = tf.cast(_fill_in_missing(inputs[_VOLUME_KEY]), tf.float32)
  
    
    # Create a feature that shows if the traffic volume is greater than the mean and cast to an int
    outputs[_transformed_name(_VOLUME_KEY)] = tf.cast(  
        
        # Use `tf.greater` to check if the traffic volume in a row is greater than the mean of the entire traffic volumn column
        tf.greater(traffic_volume, tft.mean(tf.cast(inputs[_VOLUME_KEY], tf.float32))),
        
        tf.int64)                                        

    ### END CODE HERE
    return outputs

def _fill_in_missing(x):
    default_value = '' if x.dtype == tf.string else 0
    
    return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)

Any help is appreciated.

c.godawat · December 10, 2021, 7:08am

Hi @k_b ,

Appreciate that you have reached out for help.

The only place where I see indices is used as an object is over here:-

return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)

So over here just make sure that you are passing x correctly, i.e. x should be a Numpy array and not a tensorflow type.

TensorFlow type doesnot have an object ‘indices’ and that’s exactly what the error states. ‘.indices’ is used in Numpy arrays. And here is the documentation for the same:-

https://numpy.org/doc/stable/reference/generated/numpy.indices.html

Hope this helps you to figure out the error which you are getting and you can take it further.

Let me know if you are still not able to resolve it. Always happy to help!

k_b · December 11, 2021, 11:59am

Hi @c.godawat thanks for your time and comment.

I removed the helper function as suggested in this thread and it worked.

Chanthaleex · February 2, 2022, 9:15am

Hi c.godawat,
I have some error code line of as screenshot below;
could you pls help me to solve this one

Thank
Chan

c.godawat · February 2, 2022, 9:37am

Hi @Chanthaleex

The error states that bucketize() function has no argument named ‘always_return_num_quantiles’.

When I check the official documentation of bucketize() function of TensorFlow (tft.bucketize | TFX | TensorFlow), and indeed there is no argument with that name.

So you would have to change the file traffic_transform.py where you have used the tft.bucketize() function.

Let me know if this works for you. If not, let me know some more details about the code, and we will solve it. Happy to help !

Chanthaleex · February 4, 2022, 1:43am

Hi c.godawat,

This is code that were run and any recommendation and pls help me

Bucketize the feature # Noted

for key in _BUCKET_FEATURE_KEYS:
    outputs[_transformed_name(key)] = tft.bucketize(
        inputs[key], _FEATURE_BUCKET_COUNT[key],
        always_return_num_quantiles=False)
   Best Regards and Thank in advance

Chan

Chanthaleex · February 5, 2022, 3:30am

Hi c.godawat,

Could you pls help me as code line where I am wrong since code has error as below.

Test your preprocessing_fn

import traffic_transform
from testing_values import feature_description, raw_data

NOTE: These next two lines are for reloading your traffic_transform module in case you need to

update your initial solution and re-run this cell. Please do not remove them especially if you

have revised your solution. Else, your changes will not be detected.

import importlib
importlib.reload(traffic_transform)

raw_data_metadata = dataset_metadata.DatasetMetadata(schema_utils.schema_from_feature_spec(feature_description))

with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
transformed_dataset, _ = (
(raw_data, raw_data_metadata) | tft_beam.AnalyzeAndTransformDataset(traffic_transform.preprocessing_fn))

transformed_data, transformed_metadata = transformed_dataset

and error show that: AttributeError: ‘Tensor’ object has no attribute ‘indices’

Best Regards and thank in advance
Chan

c.godawat · February 5, 2022, 4:11am

Hi @Chanthaleex

Request you to please open a new thread for resolving any of your doubts. The errors which you have posted are different. Request you to post the issue in different thread and we can take up from there. (There are many other people as well eho can help.you out in the same).

Chanthaleex · February 6, 2022, 8:21am

Dear godawat,

Thank you so much and well noted

Topic		Replies	Views
C2W2 exercise 6 preprocessing_fn Machine Learning Data Lifecycle in Production	6	698	November 11, 2022
C2W2 assignemnt preprocessing function Machine Learning Data Lifecycle in Production	30	1289	January 19, 2024
C2W2_Assignment: preprocessing_fn Machine Learning Data Lifecycle in Production	3	463	November 3, 2023
C2 W2 preprocessing_fn AI Discussions	1	49	June 16, 2023
C2W2 Failed test case: preprocessing_fn has incorrect type. Expected: typing.Callable, but got: <class 'NoneType'> Machine Learning Data Lifecycle in Production	5	583	January 15, 2023