Hi @PZ2004
I’m no expert on this so what I usually do is I run some test runs and check the memory footprint and also the time to complete an epoch to get a rough estimation how long will the model will train that is pretty lame “strategy” but it is what it is…
From what I understand from your post is that you should not have problems fitting the model into memory (which is usually the biggest problem in general) but the training time should be quite long (slow training).
I’m not sure if you have the access to Generative AI course lecture Computational challenges of training LLMs (if you don’t, it doesn’t matter - the main point is discussed here, and btw I also think that the calculations are off but a follow up might reveal the truth).
Again, I’m no expert, but in my experience, small/intermediate models (less than 300M parameters) are not a problem to train on GPUs, but for CPUs I think that would take ages.
Cheers