Is generating of data the same as in the last assigment?

someone555777 · August 28, 2023, 2:04pm

I want to ask, is everything that we do for the generating of data in assigment C3_W3_Assignment.ipynb the same that we will do in the last assigment of attention course C4_W4_Assignment?

# trax allows us to use combinators to generate our data pipeline
data_pipeline = trax.data.Serial(
    # randomize the stream
    trax.data.Shuffle(),
    
    # tokenize the data
    trax.data.Tokenize(vocab_dir=VOCAB_DIR,
                       vocab_file=VOCAB_FILE),
    
    # filter too long sequences
    trax.data.FilterByLength(2048),
    
    # bucket by length
    trax.data.BucketByLength(boundaries=[128, 256,  512, 1024],
                             batch_sizes=[16,    8,    4,   2, 1]),
    
    # add loss weights but do not add it to the padding tokens (i.e. 0)
    trax.data.AddLossWeights(id_to_mask=0)
)

# apply the data pipeline to our train and eval sets
train_stream = data_pipeline(stream(train_data))
eval_stream = data_pipeline(stream(eval_data))

so, can we delete that huge func data_generator(batch_size, x, y, pad, shuffle=False, verbose=False) from C3_W3_Assignment.ipynb if we use this code above, for example?

Just can delete trax.data.FilterByLength(2048) I think. But maybe it can be usefull it some cases too.

balaji.ambresh · September 15, 2023, 7:45pm

This topic was mentioned here and so I’m adding you folks.

@arvyzukai
@Elemento

Happy mentoring.

Topic		Replies	Views
C3 W1 Lab: Data generator NLP with Sequence Models week-1	3	574	November 17, 2022
Assignment 1, Error in training loop NLP with Sequence Models week-1	2	484	April 27, 2023
C3 - Week4: Assignment Data Generator NLP with Sequence Models week-4	1	423	July 14, 2023
NLP with Sequence Models Week 3 NER Assignment error NLP with Sequence Models week-3	5	287	December 16, 2023
Trax _in#_out# in UNQ_C5 in NLP C4 W2 NLP with Attention Models week-2	6	693	September 19, 2022

Is generating of data the same as in the last assigment?

Related topics