I’m trying to get some intuition about how to improve model performance. So I’m running the PEFT steps from week2 lab locally on my single GPU with 8GB ram. Can I please get the lora config and training arguments for the model that was trained offline? Does anyone have suggestions what seems to make the biggest impact, i.e. max_steps or rank…?
Hi Scot,
Please can you provide more context or details about the model, dataset, and the PEFT steps you mentioned?
Lab_2_fine_tune_generative_ai_model.ipynb (97.9 KB)
hi lawrence, thanks for getting back to me on this.
I’ve attached the week 2 “Lab_2_fine_tune_generative_ai_model” lab jupyter notebook.
In section 3.2 in the lab it says that “That training was performed on a subset of data. To load a fully trained PEFT model, read a checkpoint of a PEFT model from S3.”
and it gives the link to download the checkpoint for the model:
“!aws s3 cp --recursive s3://dlai-generative-ai/models/peft-dialogue-summary-checkpoint/ ./peft-dialogue-summary-checkpoint-from-s3/”
I’m trying to run the same peft steps on my local server with single 8GB ram gpu. So I’ve only tried with the configuration below:
from peft import LoraConfig, get_peft_model, TaskType
lora_config = LoraConfig(
r=32, # Rank
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)
peft_model = get_peft_model(original_model,
lora_config)
output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'
peft_training_args = TrainingArguments(
output_dir=output_dir,
auto_find_batch_size=True,
learning_rate=1e-3, # Higher learning rate than full fine-tuning.
num_train_epochs=1,
logging_steps=1,
max_steps=1
)
peft_trainer = Trainer(
model=peft_model,
args=peft_training_args,
train_dataset=tokenized_datasets["train"],
)
peft_trainer.train()
peft_model_path="./peft-dialogue-summary-checkpoint-local"
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)
however my rouge scores are much lower than the example in the lab using the model that was trained offline. So I’m wondering if there were different parameters used for the model trained offline for the lab (i.e. different lora rank, max depth…)