BertModel Generating Input and Output

YIHUI · August 10, 2022, 4:01pm

Specifically for the ‘tokenize_and_mask’ function, if more than one words are masked, only one mask symbol will be generated. When fitted into the model, how can the model knows that more than one words are masked?

arvyzukai · August 12, 2022, 4:56pm

Hi @YIHUI

What model needs is to predict what token or tokens are in that place.

To be more concrete, in your example labels would be the tokens that combined make “delicious BBQ”, so the loss function would check the model’s outputted probabilities for these tokens (if they are high - average loss is not big, if they are low - average loss is big) and would return the mean of the loss for these tokens.

Note that token (or tokens) in this case are not “word” or “words” - tokens in the Assignment case are subwords (like the label for in your example is subword “a!”), one word (for example ‘going’) could be made out of couple tokens (subwords, like ‘go’ and ‘ing’ for example) or it could be whole word or sequence of words.

The model only cares about outputing high probabilities for the labels/tokens in that place.

Topic		Replies	Views
Confusion regarding the video on BERT Objective NLP with Attention Models week-3	2	366	September 4, 2023
Understanding Masking NLP with Attention Models week-3	3	557	September 19, 2023
Transformer Decoder Mask Input NLP with Attention Models week-3	1	520	August 12, 2022
Predicting Next Set of Tokens in Decoder Model Generative AI with Large Language Models week-1	7	575	August 10, 2023
BERT pretraining NLP with Attention Models week-3	1	347	February 6, 2024

BertModel Generating Input and Output

Related topics