When we perform pretty_decode() on the encoded_str = ‘Beginners BBQ…’
‘Beginners’ is replaced with ‘’, but when i printed the sentinels dictionary as you can see below, none of the keys in it match the ‘Begginers’, even then the replacement occurs, why
This is what I got using pretty_decode()
. No replacement of Beginners
, as one would expect given the dictionary of sentinels.
Did you mean something else?
Yeah this is weird, when i ran it i got replacement for ‘Beginners’
1 Like
the sentinels is just actually using tokenizer.detokenize[vocab_size - i] so it sounds pretty random, but its just for the use below for tokenize_and_mask function where
is such that when pretty_decode is used when testing, it just detokenize the tokens and change into actual words that same as keys in sentinels. Then pretty_decode just assign the actual words into ascii replacement like , . It just that these word is only leverage to be used as intermediate to convert the pad token into ascii letters