Understanding Sentinels in C4W3_Assignment

chartechaccountant · November 10, 2025, 12:38pm

/notebooks/C4W3_Assignment.ipynb

From what I understand, sentinels are used in the decoder of the T5 as targets. They are like placeholders.

Lets assume vocab size is 32,000.

Lets assume I=2, Love =3, learning=4, and =5

Eg sentence: “I love machine learning and deep learning”

After random selection, the words “machine“ and “deep“ are selected for masking, and the input to the encoder will be as follows:

“2 3 31998 4 5 31997 4“

The input to the decoder will be :

“31998 machine 31997 deep“

But, the vocab already has valid lookup in pos 31998 and 31999 as we have already seen in the get_sentinels() method, which prints out:

The sentinel is <Z> and the decoded token is: Internațional
The sentinel is <Y> and the decoded token is: erwachsene
The sentinel is <X> and the decoded token is: Cushion

…which means index 31999 in vocab is associated with the word “International“.

Why are we associating a valid vocab index with a sentinel? How is it working out?

Alireza_Saei · November 10, 2025, 1:06pm

Hi @chartechaccountant

The apparent overlap is just an implementation detail. In T5, the sentinel tokens (like <extra_id_0>, <extra_id_1>, etc.) are special tokens added after the main 32,000-token SentencePiece vocabulary. They reuse high-end vocab indices (e.g., 32,000–32,099) that may decode to random-looking words if you inspect them directly through SentencePiece, but during model training and inference, these indices are re-mapped internally to sentinel tokens. So even though the raw tokenizer might show “International”, the model actually treats that ID as <extra_id_0>, a placeholder, not a real word.

Hope it helps! Feel free to ask if you need further assistance.

chartechaccountant · November 11, 2025, 10:17am

Understood! Thank you. This also implies that the real “International“ will have some other vocab index.

Thank you!

Alireza_Saei · November 11, 2025, 9:00pm

You’re welcome! happy to help

Topic		Replies	Views
Question about the sentinels NLP with Attention Models week-module-3	4	546	November 20, 2023
C4W3_Assignment Sentinel Index Conflicts NLP with Attention Models week-module-3	2	68	July 8, 2024
Why this code show up in C4W3 graded function NLP with Attention Models week-module-3	12	79	August 17, 2024
C4_W3, regarding pretty_decode() NLP with Attention Models course-related , week-module-3	3	187	October 8, 2024
Token, Encoder an decoder meaning Generative AI with Large Language Models week-module-1	1	463	July 1, 2023

Understanding Sentinels in C4W3_Assignment

Related topics