Loading Model - Memory Requirements

In Week 1, we learned about the memory requirements during pretraining. The calculation made sense (ie a 1B paramter model would require 24 bytes per paramter RAM capacity at least which equates to 20GB), however the lecture stated you would need 80GB. Looks like many people had the same question and I still haven’t seen a definite answer on this.

Question: If I’m loading a model out of the box from huggingface on my local machine, does this mean I would require at least that much RAM. How does that compare to RAM required for Fine-Tuning? An example would be helpful

Good question!

When loading a model from Hugging Face on your local machine for inference, the memory requirement is significantly lower (storage for the model parameters).

For fine-tuning, the memory requirement increases, resembling the training scenario due to the need of storing additional training-related data.

The exact memory requirement for fine-tuning can vary based on the dataset size, number of epochs, and other factors, it will generally be higher than merely loading the model for inference but could be lower than the initial pre-training phase, especially if you’re fine-tuning on a smaller dataset or for fewer epochs.