The lecture talks about an overview of how to fine tune and pretrain LLM models. But, how to achieve it practically? It would be good to know it from a developer’s perspective. In other words, how do I supply the smaller data set of 1000s of specific sample(in fine tuning for example) to say GPT-4o ?
Typically:
- You obtain the full set of weights and architecture for the model.
- You freeze all of the weights in the model, except for the output layer.
- Then you train just the output layer weights using your specific additional examples.
The Deep Learning Specialization has an exercise with an example of this method.
Thanks. I am not so familiar with ML and learning Generative AI application development. So, would adding smaller data sets to the model be exposed through the API (like OpenAI API) associated with the model or is there some other way to achieve it? And can closed source models be also fine tuned this way?
Sorry, I cannot answer about how the API for OpenAI works.
Fine-tuning means taking a pre-trained large language model and teaching it new things with your own data, like customizing it for your company or task it can be a legal advice, medical help etc).
Pre training is what happens before fine-tuning process, it’s training a model from scratch on a scale of general text data (like books, articles, web data) to help it understand language.
Steps are as follows:
- You choose a base model (like LLaMA, GPT, etc.).
- For fine-tuning, you give it labeled examples (like question-answer pairs)
- Train it for a few hours or days on GPUs.
- Tools like LoRA, PEFT, or QLoRA help do this efficiently without needing huge computing power.
- For pretraining, you’d need large scale data.
Most people skip pretraining and focus on fine-tuning to make existing models smarter for their specific needs.