GPU requirement for Fine Tuning stable-diffusion-v1-5

Hello,

I am trying to fine tune the stable-diffusion-v1-5 L5_Fine_Tuning.ipynb from the short course Prompt Engineering for Vision Models L5_Fine_Tuning.ipynb notebook on google colab with 12 GB of system RAM and 16 GB of GPU RAM (google colab’s T4 GPU).

When I run the training cell I get out of memmory error (the screenshot is attached below). What I understand is that this model will be fine tuned using Lora, which drastically reduces the memory foot print, then why I cannot fine tune this model on T4 GPU. Do I need a GPU with higher RAM?

Yes, from the error it seems that you need more memory!

Is there any recommendation on how much GPU I would need for this fine tuning?

Also the out of memory happens in loading the model components, this means model is not getting loaded in GPU completely.

I have no idea, try to increase the overall capacity by small increments that seem reasonable.

your error is stating it requires 2.0 gib GPU but has only 1.91 free space out of the 14 gib space of your gpu.

I usually keep by Google drive empty or use a Google drive account separate for my data science work.

try relocating your items in Google drive to different email address account, then see if this works.

You can check into your system setting to know your GPU allocation division or use external GPU but that would be costly.

Nowadays usually system come with inbuild Nvidia Graphic card which are one of the best when you need to use multiple tools or create any apps

Regards
DP

@Abdul_Rahman3 machine learning - Is it technically possible to fine-tune stable diffusion (any checkpoint) on a GPU with 8GB VRAM - Stack Overflow

@Abdul_Rahman3 also, so these are only sort of ‘vague’ rumors I’ve overhead-- But it might be easier to fine tune on Flux from Black Forest Labs.

That model is also newer and supposed to be better.

If am not mistaken it was also put together by the team that basically established the theory behind Stable Diffusion in the first place anyways.

*better = higher quality output, greater CLIP fidelity.