C4W3_Assignment Sentinel Index Conflicts

parsapico · July 7, 2024, 6:20pm

hello ,
in the tokenize_and_mask function in the assignment for question answering. sentinel indices start from upper end of vocabulary, my problem is that what happens if in some text , there is actually a word that it’s corresponding index is vocab_size -1 or something like that, it would be considered as sentinel id , right ?

Alireza_Saei · July 8, 2024, 9:14am

Hi @parsapico

In the tokenize_and_mask function, sentinel indices starting from the upper end of the vocabulary are designed to avoid conflicts with actual word indices. If your vocabulary is managed correctly, words should not have indices overlapping with the reserved sentinel indices. Ensure that the vocabulary size is set such that there is a clear distinction between actual word indices and sentinel indices to prevent any conflicts.

Hope it helps!

parsapico · July 8, 2024, 6:26pm

i don’t know if you ran the code but for example first sentinel id is actually this word ‘Internațional’ in vocab, so when we put a placeholder value we are actually giving it the id for this word. and nowhere in the code they padded vocab size to consider sentinel ids

Topic		Replies	Views
Embedding Layer Transfer Learning Natural Language Processing in TensorFlow	1	311	November 29, 2022
Question about the sentinels NLP with Attention Models week-module-3	4	508	November 20, 2023
Input dimension for Embedding Layer Week1 Assignment NLP with Attention Models week-module-1	2	294	January 8, 2024
I think there's an error in C3_W4_Lab_1.ipynb? Natural Language Processing in TensorFlow week-module-4	1	177	August 31, 2023
Doubt in C1_W4 assignment NLP with Classification and Vector Spaces week-module-4	17	71	August 28, 2024

C4W3_Assignment Sentinel Index Conflicts

Related topics