CNN input pipeline in PyTorch

Harshit1097 · June 15, 2023, 4:01pm

I have been working with tensorflow for my CNN project till now. While switching to PyTorch, I’m unable to figure out a few things regarding the input pipeline. The training dataset consists of around 500 thousand images. In tensorflow, the tf.data.Dataset can be used to create an iterable which is fed to the model. We do not need to load every image into the iterable object in this process, thus saving us from exploding the RAM. In PyTorch, I’m unable to figure out how to carry out this task. Using Dataset from torch.utils.data loads each image and corresponding label into the iterable which takes a heavy toll on RAM. What is it that I’m missing?

balaji.ambresh · June 15, 2023, 6:30pm

If loading individual images inside __get_item__() of Dataset exhausts RAM, odds are good that the batch size of the DataLoader is set to a high value.

Harshit1097 · June 16, 2023, 5:36am

Thanks @balaji.ambresh . Actually the mistake was on my part. Instead of defining individual image transformations inside the __get_item__() method, I did this inside the __init__() method. Now as I have corrected this, the RAM issue is no longer there.

balaji.ambresh · June 16, 2023, 2:26pm

Thanks for confirming. Cheers.

Topic		Replies	Views
How to deal with big data? Improving Deep Neural Networks: Hyperparameter tun	1	370	August 22, 2023
CNN data tips AI Discussions	7	71	August 29, 2023
DataLoader for Diffusion model AI Discussions	8	181	September 12, 2023
I'm having some confusing questions Introduction to TF for Artificial Intelligence ... week-2	4	544	April 24, 2022
Week 3 Assignment - Image Segmentation with U-Net Convolutional Neural Networks	29	705	October 17, 2023

CNN input pipeline in PyTorch

Related topics