I don't really understand how padded_batch() works

Hi everyone,

Could someone pls explain to me how padded_batch() works…?

I’m currently at this place in this Week 2’s workbook and I don’t really understand how padded_batch() works… The notebook says that it is used to “Batch and pad the datasets to the maximum length of the sequences” but how does this works. How does the maximum length of the sentences equal to this “64”? and what is the unit of this “64”?

I understood padding up until last lesson about how padding is to help adding zeros as part of tokenification but I just don’t know what effect does this padded_batch() do… My guess it that it makes each sequences have 64 values?

Thank you!

Have you seen this link on padded_batch ?

Here’s one way to inspect train_dataset:

it = iter(train_dataset)
# look at 5 batches
for i in range(5):
  x, y = next(it)
  print(x.shape, y.shape)
1 Like