Here its mentioned in the lab notebook under data slicing section -
- If you want to be more specific, then you can map the specific value to the feature name. For example, if you want just
Male
, then you can declare it as features={'sex': [b'Male']}
. Notice that the string literal needs to be passed in as bytes with the b'
prefix.
- You can also pass in several features if you want. For example, if you want to slice through both the
sex
and race
features, then you can do features={'sex': None, 'race': None}
I tried something as below -
from tensorflow_data_validation.utils import slicing_util
slice_fn = slicing_util.get_feature_value_slicer(features={‘sex’: [b’Male’], ‘race’: [b’Asian’]})
And when I am trying to view the result then I can see sex is male only but the race is not Asian its coming as white, can somebody please explain me, what I am doing wrong here?
Please find the screenshots below -
There are no records where sex = “Asian”. Here are the unique values in the training data.
>>> train_df.race.unique()
array(['White', 'Black', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo',
'Other'], dtype=object)
When you specify a dictionary like {'sex': [b'Male'], 'race': [b'Asian-Pac-Islander']}
, the records picked are those that satisfy both conditions. The generated datasets are ['All Examples', 'race_Asian-Pac-Islander_sex_Male']
On the other hand, when you specify the features as {'sex': None, 'race': None}
, you’ll have the following slices generated:
Datasets generated: ['All Examples', 'race_White_sex_Male', 'race_Black_sex_Male', 'race_Black_sex_Female', 'race_White_sex_Female', 'race_Asian-Pac-Islander_sex_Male', 'race_Amer-Indian-Eskimo_sex_Male', 'race_Other_sex_Female', 'race_Asian-Pac-Islander_sex_Female', 'race_Amer-Indian-Eskimo_sex_Female', 'race_Other_sex_Male'] Type of sliced_stats elements: <class 'tensorflow_metadata.proto.v0.statistics_pb2.DatasetFeatureStatistics'>
It seems like you are looking for the last option to pick a few and compare statistics.