StringLookup “oov_token=None” option does not remove “None” from vocabulary.
Please update your post with sample code.
hi @eciuffo
Tensorflow documentation for stringlookup mentions the below
tf.keras.layers.StringLookup(
max_tokens=None,
num_oov_indices=1,
mask_token=None,
oov_token='[UNK]',
vocabulary=None,
idf_weights=None,
invert=False,
output_mode='int',
pad_to_max_tokens=False,
sparse=False,
encoding='utf-8',
name=None,
**kwargs
)
oov_token: Only used when invert is True. The token to return for OOV indices. Defaults to “[UNK]”.
num_oov_indices
The number of out-of-vocabulary tokens to use. If this value is more than 1, OOV inputs are modulated to determine their OOV value. If this value is 0, OOV inputs will cause an error when calling the layer. Defaults to 1.
So the None is to be mentioned for num_oov_indices and not for oov_token.
regards
DP
I get the following and can’t see how the above helps get rid of ‘[UNK]’ -
Vocabulary of labels looks like this: [‘[UNK]’, ‘sport’, ‘business’, ‘politics’, ‘tech’, ‘entertainment’]
I need help understanding what to do next.
Specifying num_oov_indices
as 0
to StringLookup
will disallow out of vocabulary lookups when performing mapping from words to indices. “[UNK]” will not be part of the vocabulary anymore with the above mentioned setup.
Thank you, this solution worked
Ken