How are training samples created?

arvyzukai · August 5, 2023, 10:07am

Yes @Peixi_Zhu you can say that the input is a 2 lists of tokens. Just to mention one additional thing - training is usually done with mini-batches - meaning there are for example 32 pairs of lists for each model weights update.

Well, because of the needed padding, 27 tokens would become 32. That would result of the output to be - 32 x 33000. But your understanding is correct.

The mask in this case would be 0 for the 5 tokens that were needed for 27 to become 32. So there would be 27 ones, and 5 zeroes ([1, 1, 1, … , 0, 0]).
When the model make predictions (for 32 tokens), the predictions are multiplied by the mask, this essentially makes the loss on padding tokens to be 0 (model is not penalized or rewarded for predicting padding tokens). This way model only “trains” to correctly predict tokens that have mask of 1.

Cheers

P.S. you might be interested in this post which explains the next_symbol function in more detail.

Topic		Replies	Views
Possible silly mistake about W4_UNQ_C1 NLP with Attention Models week-4	2	490	February 17, 2023
C4W1/C4W1_Assignment "Exercise 5 - translate": I got a bunch of "eu", don't know the cause NLP with Sequence Models week-1 , week-4 , ai-discussions	3	19	August 7, 2024
C5 Week 3 A2 Exercise 4 Trigger word detection v2a Sequence Models coursera-platform	6	925	August 8, 2021
Clarification about Course 4 Week 3 HW NLP with Attention Models week-3	2	595	May 6, 2022
NLP c3w3 question "Assignment 3: Question duplicates" NLP with Sequence Models week-3	7	698	January 8, 2024

How are training samples created?

Related topics