Meaning of "\10;" in sentences and why Detokenize remove them

In the example with “train_input” the tokens 30650, 4729, and 992 appear in the end of both english and German tokenized streams. They equal "", “10” and “;” characters.

When the trax.data.detokenize functions takes this input, it removes these values from the sentence. I couldn’t find this online in the documentation, and am curious why the tokenization process removes these specially from the sentence as I don’t see it anywhere in the code. Thanks

ascii 10 is line feed

1 Like