Finetuning LLMs

Y_L1 · September 24, 2024, 5:57am

I understand that training llms includes pretraining and then finetuning. I know the pretraining is conducted using a very large corpus of as much as hundreds of billions of tokens, and finetuning only uses a small fraction of the amount (i.e. maybe 1000 examples). From what I understand the training is done, in much the same way, through backpropogation. Why are the (say 500,000 tokens) from the 1000 examples having a bigger effect than the last 500,000 tokens, used in the pretraining phase?

TMosh · September 24, 2024, 6:10am

Fine tuning typically adds new layers onto an existing model, to teach it a specific task.

Y_L1 · September 24, 2024, 6:14am

I am taking the situation (in transformers) where the same architecture is used and all parameters are trained (full finetuning).

TMosh · September 24, 2024, 6:17am

I see.

Y_L1 · September 24, 2024, 7:26am

Your private messages were helpful. I see the finetuning is not having more effect. It is teaching the model to answer questions.

Topic		Replies	Views
Question on how Base LLMs are trained Generative AI with Large Language Models week-module-2	4	429	August 3, 2023
Enroll in Finetuning Large Language Models! News and Announcements	2	241	August 25, 2023
Week 2: Instruction fine-tuning Generative AI with Large Language Models llm , prompting	1	65	November 18, 2024
Is pre-training the unsupervised training of an LLM? Generative AI with Large Language Models week-module-2	10	285	July 24, 2024
Questions about Finetuning LLMs Multi-Task and Inference time Generative AI with Large Language Models week-module-2	1	276	February 22, 2024

Finetuning LLMs

Related topics