Week 2 Lab: Training configuration of the PEFT model

Clement1103 · November 19, 2024, 9:45am

Hi,

I am currently taking the LLM course and have just finished week 2.

To make sure that I’ve assimilated the concepts covered in the second lab, particularly the LoRA method, I’ve redone the work in a Colab session.

At some point in the labwork, we load a model that has already been fine-tuned, so as to avoid a long training period (section 3.2).
Since I hadn’t downloaded this model, I decided to train my fine-tuned model myself (on the full training set), but I get very inferior performance to the labwork model.

Here’s my training configuration:
“”"
lora_config = LoraConfig(
r=8, # Rank
lora_alpha=32,
target_modules=[“q”, “v”],
lora_dropout=0.05,
bias=“none”,
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

peft_model = get_peft_model(model, lora_config).to(‘cuda’)

os.environ[“WANDB_DISABLED”] = “true”

output_dir = ‘/content’
training_args = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
learning_rate=1e-3,
num_train_epochs=100,
logging_steps=10,
evaluation_strategy=“steps”,
eval_steps=10,
save_strategy=“steps”,
save_steps=10,
save_total_limit=2,
max_steps=100,
load_best_model_at_end=True
)

peft_trainer = Trainer(
model=peft_model,
args=training_args,
train_dataset=tokenized_dataset[“train”],
eval_dataset=tokenized_dataset[“validation”],
tokenizer=tokenizer,
)

peft_trainer.train()

final_model = peft_model.merge_and_unload()
“”"

Then I realise the inferences with the final model, but performance is often not as good as with the basic model, so I wanted to know if it came from my training configuration which may be very different from the one used by the instructors, or if I just missed something.

Thank you in advance and have a nice day.

nadtriana · November 19, 2024, 1:25pm

Hi Clément, welcome to the community!

Here are a few observations and suggestions you can try to improve your results:

Adjust your learning rate, e.g. 5e-5 or 1e-4, and your dropout values.
Use only one of the num_train_epochs or max_steps.
Experiment with extended target_modules (target_modules=["q", "k", "v", "o"] or other combinations) and higher rank values in the LoRA configuration. Similarly, experimenting with lora_alpha values around 16-64 may give better results.
Monitor the training logs to track validation loss and performance metrics such as ROUGE or BLEU to detect early signs of underfitting or overfitting.

Keep experimenting and good luck!

Clement1103 · November 20, 2024, 11:21am

Hi,

Thanks a lot for your quick and useful answer. I experimented by varying the parameters as you advised, but I still need to keep trying other combinations.
While training my peft model, I tried to set ROUGE as the evaluation metric to best monitor performance.

I’ve set my training as follows:

rouge = evaluate.load('rouge')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred

    predictions = predictions.detach().cpu().numpy()
    labels = labels.detach().cpu().numpy()
    # Remplacing-100 by PAD token to avoid decoding mistakes
    labels = [[(l if l != -100 else tokenizer.pad_token_id) for l in label] for label in labels]

    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    rouge_scores = rouge.compute(predictions=decoded_preds, references=decoded_labels)
    return {
        "rouge1": rouge_scores["rouge1"].mid.fmeasure,
        "rouge2": rouge_scores["rouge2"].mid.fmeasure,
        "rougeL": rouge_scores["rougeL"].mid.fmeasure,
    }

import os
os.environ['WANDB_DISABLED'] = 'true'
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    learning_rate=5e-5,
    num_train_epochs=3,
    logging_steps=10,
    evaluation_strategy="steps", 
    eval_steps=10,  
    save_strategy="steps",
    save_steps=10,
    save_total_limit=2,
    metric_for_best_model="rouge2",  
    greater_is_better=True  
)

peft_trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics, 
)

But with this configuration, I encounter a memory error message (probably due to the compute_metrics function, since I don’t get an overflow when I comment the line), even on an A100 GPU with 40GiB of RAM, even if I send the data to the CPU.

I apologize for sending raw code like this, but I was asking if I could modify something to track the ROUGE metrics during training, avoiding overflow.

Thank you once again, and I apologize if my questions are not in the right section of the forum.

Have an excellent day.

nadtriana · November 21, 2024, 1:17pm

Glad to hear you’re making progress with your experiments! Memory issues during evaluation - especially when calculating metrics like ROUGE - are a common challenge when fine-tuning large models. Here are some suggestions:

Increase eval_steps to reduce the evaluation frequency. For example, eval_steps=50.
Use batch decoding in compute_metrics to avoid processing the whole data set at once.
Use gradient accumulation to simulate a larger batch size. For example, gradient_accumulation_steps=4.
Evaluate a subset of the validation data or use mixed-precision training (fp16=True) to reduce memory requirements further.

Hope this helps. Have a good day and happy experimenting!

Topic		Replies	Views
Lab 2 - What training parameters were used to full train the LoRA tuned model? Generative AI with Large Language Models week-2	2	73	June 25, 2024
W2 lab lora config and training parameters offline model Generative AI with Large Language Models week-2	2	373	October 31, 2023
Failing to achieve 2 folds improvement of the peft lora example in week2 lab Generative AI with Large Language Models lab-help	4	31	April 29, 2025
PEFT-LoRA: model performance Generative AI with Large Language Models week-2	0	460	October 3, 2023
Understanding Peft Library Generative AI with Large Language Models week-3	1	528	July 16, 2023

Week 2 Lab: Training configuration of the PEFT model

Related topics