C4_W3, regarding pretty_decode()

When we perform pretty_decode() on the encoded_str = ‘Beginners BBQ…’
‘Beginners’ is replaced with ‘’, but when i printed the sentinels dictionary as you can see below, none of the keys in it match the ‘Begginers’, even then the replacement occurs, why

This is what I got using pretty_decode(). No replacement of Beginners, as one would expect given the dictionary of sentinels.

Did you mean something else?

Yeah this is weird, when i ran it i got replacement for ‘Beginners’

1 Like

the sentinels is just actually using tokenizer.detokenize[vocab_size - i] so it sounds pretty random, but its just for the use below for tokenize_and_mask function where


is such that when pretty_decode is used when testing, it just detokenize the tokens and change into actual words that same as keys in sentinels. Then pretty_decode just assign the actual words into ascii replacement like , . It just that these word is only leverage to be used as intermediate to convert the pad token into ascii letters