Hey there,
Why do we use BATCH_SIZE
for the size of batch in get_batched_dataset
, not the BATCH_SIZE * strategy.num_replicas_in_sync
in lab C2_W4_Lab_3_using-TPU-strategy
?
Is this a bug in C2_W4_Lab_3_using-TPU-strategy
?
Compare with other labs C2_W4_Lab_2_multi-GPU-mirrored-strategy
and C2_W4_Lab_1_basic-mirrored-strategy
.
thank you
Hi there,
At the compute_loss function at this lab is returning global_batch_size=BATCH_SIZE * strategy.num_replicas_in_sync which is basically what is doing in the previous labs when setting the setting the GLOBAL_BATCH_SIZE parameter above.
@gent.spah thanks, but…
I see the same approach at computing the loss from those labs.
My point there is about creating the dataset. They are different, or the size of batch is randomly?
As far as I can see having two different computational strategies the way the data is fed in is a bit different but the principle is the same. The dataset is the same, then is shuflled and then a batch size is chosen to be fed (per replica or global depending on the strategy) but the rest of the computations add up and account of all resources used. The batch size in itself can be any but normally is choosen a power of 2 because of the binary logic.