Couple Questions From Week 1

Great high level information and overview thus far, however have a few questions coming out of week 1:

  1. We received an overview of memory/compute requirements for pre-training your own LLM. Although this is beneficiary, I’m more interested in knowing any rule-of-thumbs or calculations to determine server requirements for loading/fine-tuning/training/using a base (foundation) model, lets say I wanted to use Llama2 from HuggingFace? Also, I’m interesting in knowing what exactly is getting stored in memory when working with a base model, is it just the model weights?

  2. Is In-Context Learning (ICL) specific to pre-training/training? For example, if I use Chat-GPT’s app, would you see far better results if you provided an example (ie. one-shot, few-shot) in the prompt?

For this one most probably as it would give the LLM a context!