PEFT model inference

Brusk · March 20, 2024, 5:55pm

I was able to redo everything locally until the peft_trainer.train().
I did save this trained version locally as well, and I trying to use it for the last steps (section 3.3 - Evaluate the Model Qualitatively).

In the lab, the original_model, the instruct_model and peft_model are used for inference, and the results are compared. However, the way that the output are got are slightly diferent than in previous sections. I’m doing the same as before because it works for the original model and instruct model (the new way doesn’t). Unfortunately, for the peft_model I’m getting an error and I’m not sure how to solve it (and/or why I am getting it).

Here is a section of the code I am refering to:

from peft import PeftModel, PeftConfig
peft_model_base = AutoModelForSeq2SeqLM.from_pretrained(“google/flan-t5-base”, torch_dtype=torch.bfloat16)
peft_model_base.to(torch.device(“cuda:0”))

tokenizer = AutoTokenizer.from_pretrained(“google/flan-t5-base”)

peft_model = PeftModel.from_pretrained(peft_model_base,
‘./peft-dialogue-summary-checkpoint-local’,
torch_dtype=torch.bfloat16,
is_trainable=False)
peft_model.to(torch.device(“cuda:0”))

index = 200
dialogue = dataset[‘test’][index][‘dialogue’]
baseline_human_summary = dataset[‘test’][index][‘summary’]

prompt = f"“”\nSummarize the following conversation.\n\n{dialogue}\n\nSummary: “”"

input_ids = tokenizer(prompt, return_tensors=“pt”)
inputs = input_ids.to(torch.device(“cuda:0”))

original_model.to(torch.device(“cuda:0”))
original_model_outputs = original_model.generate(inputs[“input_ids”], max_new_tokens=200,)
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(inputs[“input_ids”], max_new_tokens=200,)
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

Neither options here work:

#peft_model_outputs = peft_model .generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_outputs = peft_model.generate(inputs[“input_ids”], max_new_tokens=200,)
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

Here is the error I get:

peft_model_outputs = peft_model.generate(inputs[“input_ids”], max_new_tokens=200,)
Traceback (most recent call last):
File “”, line 1, in
TypeError: PeftModelForSeq2SeqLM.generate() takes 1 positional argument but 2 were given
generation_config = GenerationConfig(max_new_tokens=200, num_beams=1)
peft_model_outputs = peft_model.generate(inputs[“input_ids”], generation_config)
Traceback (most recent call last):
File “”, line 1, in
TypeError: PeftModelForSeq2SeqLM.generate() takes 1 positional argument but 3 were given

Brusk · March 22, 2024, 1:57pm

Anyone?

Deepti_Prasad · March 22, 2024, 6:55pm

Hello @Brusk

Can I know why are using this everywhere to your left_model?? This is actually not required.

Incorrect checkpoint. CORRECT CHECKPOINT: ‘./peft-dialogue-summary-checkpoint-from-s3/’

3.3 - Evaluate the Model Qualitatively (Human Evaluation)
Make inferences for the same example as in sections [1.3], with the original model, fully fine-tuned and PEFT model.
i.e. YOUR PROMPT IS INCORRECT

CORRECT PROMPT FROM 1.3 EXERCISE
prompt = f"“”
Summarize the following conversation.

{dialogue}

Summary: “”"

**STOP using .to(torch.device(“cud:0”)), also incorrect way to tokenize

Correct way to tokenize is tokenizer(prompt, return_tensors=“pt”).input_ids

Incorrect way to generate original model. It should contain input_ids as first parameter and then generation_config=GenerationConfig(max_new_tokens=200, num_beams=1)

Same issue with the instruct_model and left_model to generate model code you need (input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1)

Regards
DP

Brusk · March 25, 2024, 2:15pm

1.
If I don’t, I get the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper__index_select)

2.
‘./peft-dialogue-summary-checkpoint-local’, is the model path of the model I saved locally after running the peft.trainer.train() of the previous section. It should work just as the checkpoint from s3.

3.
They are the same. This is not an issue. It works with all other models.

4. 5. 6.
This way, it works for the original model:

input_ids = tokenizer(prompt, return_tensors=“pt”)
inputs = input_ids.to(torch.device(“cuda:0”))
original_model.generate(inputs[“input_ids”], max_new_tokens=200,)

This way it also works for the original model:

input_ids = tokenizer(prompt, return_tensors=“pt”).input_ids
input_ids = input_ids.to(torch.device(“cuda:0”))
original_model.generate(input_ids, GenerationConfig(max_new_tokens=200, num_beams=1))

However, neither of those ways work for the peft_model (why do you call it 'left model?!). For either way, I get a similar error:

TypeError: PeftModelForSeq2SeqLM.generate() takes 1 positional argument but 3 were given

7.
Just out of curiosity, I tried to download other peft dialogue summary checkpoint, and I still get the same issue.

Brusk · March 25, 2024, 4:31pm

I found a similar post here.

The workaround would be naming the argument, peft_model.generate(input_ids=inputs.input_ids).

The problem is that PeftModelForSeq2SeqLM.generate currently does not pass along positional arguments to the base model.

This looks deliberate as there’s a test case validating it, presumably because the seq2seq variant makes use of the arguments by plucking them from the **kwargs, which would not work if they’re passed as positional arguments.

input_ids is often passed as a positional argument in the Transformers documentation, so the current API contradicts expectations.

rkj203 · April 3, 2024, 2:27am

What parameters in lora_config and/or training_arguments can be changed to generate a better model ( like the one in S3 checkpoint ) ?

i have tried reducing the learning rate, increase the epochs and logging_steps, max steps in training_arguments
and Rank in lora_config – but the models have much lower Rouge score than S3 checkpoint

Thanks

arunabeshc · March 3, 2025, 9:23am

Hello,

Did you get an update to your question regarding the LORA config to be used to train a model like the S3 checkpoint?

Topic		Replies	Views
Lab3 peft/lora Generative AI with Large Language Models week-module-3	1	435	September 16, 2023
Understanding Peft Library Generative AI with Large Language Models week-module-3	1	559	July 16, 2023
What is the difference between model = PeftModel.from_pretrained(...) and model = get_peft_model(....) Generative AI with Large Language Models week-module-2	0	597	September 24, 2023
PEFT fine-tuning on Flan-t5-base model does not change inference results Generative AI with Large Language Models week-module-2	2	369	March 14, 2024
W2 lab lora config and training parameters offline model Generative AI with Large Language Models week-module-2	2	382	October 31, 2023

PEFT model inference

Here is a section of the code I am refering to:

Neither options here work:

Here is the error I get:

Related topics