Recommended way to tokenize new code

>>> from keras.layers import StringLookup
>>> string_lookup = StringLookup(vocabulary=vectorize_layer.get_vocabulary(include_special_tokens=False), invert=True)
>>> string_lookup(sentences_to_tokens - 1)
<tf.Tensor: shape=(2, 3), dtype=string, numpy=
array([[b'i', b'love', b'[UNK]'],
       [b'i', b'love', b'[UNK]']], dtype=object)>
>>> 

1 Like