C3W1 Assignment - Natural Language processing in Tensorflow String Lookup

eciuffo · December 21, 2024, 2:36pm

StringLookup “oov_token=None” option does not remove “None” from vocabulary.

balaji.ambresh · December 21, 2024, 4:05pm

Please update your post with sample code.

Deepti_Prasad · December 21, 2024, 5:27pm

Tensorflow documentation for stringlookup mentions the below


tf.keras.layers.StringLookup(
    max_tokens=None,
    num_oov_indices=1,
    mask_token=None,
    oov_token='[UNK]',
    vocabulary=None,
    idf_weights=None,
    invert=False,
    output_mode='int',
    pad_to_max_tokens=False,
    sparse=False,
    encoding='utf-8',
    name=None,
    **kwargs
)

oov_token: Only used when invert is True. The token to return for OOV indices. Defaults to “[UNK]”.

num_oov_indices
The number of out-of-vocabulary tokens to use. If this value is more than 1, OOV inputs are modulated to determine their OOV value. If this value is 0, OOV inputs will cause an error when calling the layer. Defaults to 1.

So the None is to be mentioned for num_oov_indices and not for oov_token.

regards
DP

Kenneth_Brezinsky · March 11, 2025, 12:42am

I get the following and can’t see how the above helps get rid of ‘[UNK]’ -
Vocabulary of labels looks like this: [‘[UNK]’, ‘sport’, ‘business’, ‘politics’, ‘tech’, ‘entertainment’]
I need help understanding what to do next.

balaji.ambresh · March 11, 2025, 6:55am

Specifying num_oov_indices as 0 to StringLookup will disallow out of vocabulary lookups when performing mapping from words to indices. “[UNK]” will not be part of the vocabulary anymore with the above mentioned setup.

Kenneth_Brezinsky · March 11, 2025, 4:43pm

Thank you, this solution worked

Ken

Topic		Replies	Views
Check OOV token Natural Language Processing in TensorFlow week-module-2	4	108	November 4, 2024
Unable to Exclude OOV Tokens Natural Language Processing in TensorFlow week-module-1	9	577	November 18, 2024
C3W2 fit_label_encoder Error Natural Language Processing in TensorFlow	7	245	September 26, 2024
Tensorflow Developer Course 3 Week 1 Natural Language Processing in TensorFlow week-module-1	2	28	January 16, 2025
C3W2 wrong vocabulary to encode labels Introduction to TF for Artificial Intelligence ... week-module-2	2	43	September 25, 2024

C3W1 Assignment - Natural Language processing in Tensorflow String Lookup

Related topics