I am new to AI, and using the Lamini lab trainer. It’s working well, but I’d like to use batching to speed things up.
The training_args have a parameter to change the batch size. But when I do this, for eg:
per_device_train_batch_size=16
I get this error:
RuntimeError: stack expects each tensor to be equal size, but got [39] at entry 0 and [112] at entry 1
I tried using the Dataloader with a collate function:
from torch.utils.data import DataLoader, default_collate
def my_collate(batch):
data = [item[0] for item in batch]
target = [item[1] for item in batch]
target = torch.LongTensor(target)
return [data, target]
trainset = DataLoader(dataset=train_dataset,
batch_size=16,
shuffle=True,
collate_fn=my_collate,
pin_memory=True)
Using the above code gives an error during training:
Dataloader object is not subscriptable.
Is there a different way to do batching?