Please can someone explain how to use tthese in finetuning.
per_device_train_batch_size=1, # Batch size for each device (e.g., GPU) during training.
gradient_accumulation_steps=8,
Mr. Zu explained it a bit but I still don’t understand it.
My question
- What about people who use cloud provideers for GPU when training how do we calculate the number of gpu used in the training. Thank you in advance for a response