How to use batching while using Lamini lab trainer?

amitontheweb · December 5, 2024, 6:59am

I am new to AI, and using the Lamini lab trainer. It’s working well, but I’d like to use batching to speed things up.

The training_args have a parameter to change the batch size. But when I do this, for eg:

per_device_train_batch_size=16

I get this error:

RuntimeError: stack expects each tensor to be equal size, but got [39] at entry 0 and [112] at entry 1

I tried using the Dataloader with a collate function:

from torch.utils.data import DataLoader, default_collate

def my_collate(batch):
        data = [item[0] for item in batch]
        target = [item[1] for item in batch]
        target = torch.LongTensor(target)
        return [data, target]

trainset = DataLoader(dataset=train_dataset,
                          batch_size=16,
                          shuffle=True,
                          collate_fn=my_collate,
                          pin_memory=True)

Using the above code gives an error during training:

Dataloader object is not subscriptable.

Is there a different way to do batching?

Topic		Replies	Views
05_Training Error when running locally Finetuning Large Language Models	5	214	February 5, 2024
Loading training data in batches using DataLoader() AI Discussions ai-discussions	0	75	February 6, 2024
No executable batch size found Generative AI with Large Language Models week-module-2	0	440	November 1, 2023
05_Training_lab_student - Error with "Finetune a model in 3 lines of code using Lamin" Finetuning Large Language Models	2	233	August 7, 2024
Week2 Lab 2 : TypeError: Accelerator.__init__() got an unexpected keyword argument 'dispatch_batches' Generative AI with Large Language Models project	2	293	May 18, 2025

How to use batching while using Lamini lab trainer?

Related topics