C3_W4 Assignment: Padding in excercise 2

avazeh · March 5, 2023, 5:54pm

Hi,

The code ouline indicates that we are padding the questions to make the input in each batch have the same length. But we are not necessary making all batches have the same size. Does this means the batched could be of different sizes?

Apart from a drop in performance, if batches have different sizes, won’t this cause errors when calculating the loss function?

Or have I misunderstood the code?
Thanks,
Ava

gent.spah · March 5, 2023, 6:21pm

Hi Ava,

I did this specialization a while ago. Normally when you train a model you specify the batch size and thats probably the case here too. If you look more attentively into the code you will probably find that the batch size has been specified. The batch size is normally a power of 2 because this relates to the binary system of 1s and 0s so the computer calculations happen faster.

Sometimes when a training set or other set is divided into lets say x number of batches the last batch might be smaller than the others, because there are no images left, but thats not a big issue in terms of cost because the cost is averaged over all the batches. Also yes it makes sense that all batches have the same size so finding agregates is a consistent process.

avazeh · March 5, 2023, 6:40pm

Hi,

Thanks for your response. Batch size is the same, except the last one which you mentioned. I was pointing at the padding length being different across the batches. I’m getting my head around how different padding size could cause issues down the line…

gent.spah · March 6, 2023, 8:43am

If I am understanding right the padding overall size will be the same but some sentences need more padding and some less because their original size is also different.

avazeh · March 6, 2023, 9:15am

That was my assumption too but that’s not how it is implemented in the assignment. At the end of each batch, max_len is recalculated. So there is no mechanism to ensure the same max_len is applied to across all batches.

Here is the data generator function:

# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

gent.spah · March 6, 2023, 9:22am

If this function goes though all the batches I think it will find the max_len for all the batches present and as far as I remember that assignment, it does. I am going to delete the code because its should not be made public.

avazeh · March 6, 2023, 7:36pm

I found the answer! It is explained in the first assigment of the next course in the specialisation. It’s called Bucketing!

Thanks for sharing your experience
Ava

Topic		Replies	Views
Max_len different for each batch in Siamese network assignment NLP with Sequence Models week-module-4	3	535	November 25, 2022
Does only transformer need padding using max_length? Sequence Models coursera-platform	8	880	March 8, 2023
DLS 5 - Input/output of varying window sizes Sequence Models coursera-platform	7	534	June 8, 2022
Confused about padding in next_symbol implementation NLP with Attention Models week-module-1	2	387	August 1, 2023
I don't really understand how padded_batch() works Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	1	560	May 18, 2022

C3_W4 Assignment: Padding in excercise 2

Related topics