Is generating of data the same as in the last assigment?

I want to ask, is everything that we do for the generating of data in assigment C3_W3_Assignment.ipynb the same that we will do in the last assigment of attention course C4_W4_Assignment?

# trax allows us to use combinators to generate our data pipeline
data_pipeline = trax.data.Serial(
    # randomize the stream
    trax.data.Shuffle(),
    
    # tokenize the data
    trax.data.Tokenize(vocab_dir=VOCAB_DIR,
                       vocab_file=VOCAB_FILE),
    
    # filter too long sequences
    trax.data.FilterByLength(2048),
    
    # bucket by length
    trax.data.BucketByLength(boundaries=[128, 256,  512, 1024],
                             batch_sizes=[16,    8,    4,   2, 1]),
    
    # add loss weights but do not add it to the padding tokens (i.e. 0)
    trax.data.AddLossWeights(id_to_mask=0)
)

# apply the data pipeline to our train and eval sets
train_stream = data_pipeline(stream(train_data))
eval_stream = data_pipeline(stream(eval_data))

so, can we delete that huge func data_generator(batch_size, x, y, pad, shuffle=False, verbose=False) from C3_W3_Assignment.ipynb if we use this code above, for example?

Just can delete trax.data.FilterByLength(2048) I think. But maybe it can be usefull it some cases too.

This topic was mentioned here and so I’m adding you folks.

@arvyzukai
@Elemento

Happy mentoring.