C3W1 Assignment - Natural Language processing in Tensorflow String Lookup

StringLookup “oov_token=None” option does not remove “None” from vocabulary.

Please update your post with sample code.

hi @eciuffo

Tensorflow documentation for stringlookup mentions the below


tf.keras.layers.StringLookup(
    max_tokens=None,
    num_oov_indices=1,
    mask_token=None,
    oov_token='[UNK]',
    vocabulary=None,
    idf_weights=None,
    invert=False,
    output_mode='int',
    pad_to_max_tokens=False,
    sparse=False,
    encoding='utf-8',
    name=None,
    **kwargs
)

oov_token: Only used when invert is True. The token to return for OOV indices. Defaults to “[UNK]”.

num_oov_indices
The number of out-of-vocabulary tokens to use. If this value is more than 1, OOV inputs are modulated to determine their OOV value. If this value is 0, OOV inputs will cause an error when calling the layer. Defaults to 1.

So the None is to be mentioned for num_oov_indices and not for oov_token.

regards
DP

1 Like

I get the following and can’t see how the above helps get rid of ‘[UNK]’ -
Vocabulary of labels looks like this: [‘[UNK]’, ‘sport’, ‘business’, ‘politics’, ‘tech’, ‘entertainment’]
I need help understanding what to do next.

Specifying num_oov_indices as 0 to StringLookup will disallow out of vocabulary lookups when performing mapping from words to indices. “[UNK]” will not be part of the vocabulary anymore with the above mentioned setup.

Thank you, this solution worked

Ken