Hi,
I am currently taking the LLM course and have just finished week 2.
To make sure that I’ve assimilated the concepts covered in the second lab, particularly the LoRA method, I’ve redone the work in a Colab session.
At some point in the labwork, we load a model that has already been fine-tuned, so as to avoid a long training period (section 3.2).
Since I hadn’t downloaded this model, I decided to train my fine-tuned model myself (on the full training set), but I get very inferior performance to the labwork model.
Here’s my training configuration:
“”"
lora_config = LoraConfig(
r=8, # Rank
lora_alpha=32,
target_modules=[“q”, “v”],
lora_dropout=0.05,
bias=“none”,
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)
peft_model = get_peft_model(model, lora_config).to(‘cuda’)
os.environ[“WANDB_DISABLED”] = “true”
output_dir = ‘/content’
training_args = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
learning_rate=1e-3,
num_train_epochs=100,
logging_steps=10,
evaluation_strategy=“steps”,
eval_steps=10,
save_strategy=“steps”,
save_steps=10,
save_total_limit=2,
max_steps=100,
load_best_model_at_end=True
)
peft_trainer = Trainer(
model=peft_model,
args=training_args,
train_dataset=tokenized_dataset[“train”],
eval_dataset=tokenized_dataset[“validation”],
tokenizer=tokenizer,
)
peft_trainer.train()
final_model = peft_model.merge_and_unload()
“”"
Then I realise the inferences with the final model, but performance is often not as good as with the basic model, so I wanted to know if it came from my training configuration which may be very different from the one used by the instructors, or if I just missed something.
Thank you in advance and have a nice day.