PEFT model inference

I was able to redo everything locally until the peft_trainer.train().
I did save this trained version locally as well, and I trying to use it for the last steps (section 3.3 - Evaluate the Model Qualitatively).

In the lab, the original_model, the instruct_model and peft_model are used for inference, and the results are compared. However, the way that the output are got are slightly diferent than in previous sections. I’m doing the same as before because it works for the original model and instruct model (the new way doesn’t). Unfortunately, for the peft_model I’m getting an error and I’m not sure how to solve it (and/or why I am getting it).

Here is a section of the code I am refering to:

from peft import PeftModel, PeftConfig
peft_model_base = AutoModelForSeq2SeqLM.from_pretrained(“google/flan-t5-base”, torch_dtype=torch.bfloat16)“cuda:0”))

tokenizer = AutoTokenizer.from_pretrained(“google/flan-t5-base”)

peft_model = PeftModel.from_pretrained(peft_model_base,

index = 200
dialogue = dataset[‘test’][index][‘dialogue’]
baseline_human_summary = dataset[‘test’][index][‘summary’]

prompt = f"“”\nSummarize the following conversation.\n\n{dialogue}\n\nSummary: “”"

input_ids = tokenizer(prompt, return_tensors=“pt”)
inputs =“cuda:0”))“cuda:0”))
original_model_outputs = original_model.generate(inputs[“input_ids”], max_new_tokens=200,)
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(inputs[“input_ids”], max_new_tokens=200,)
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

Neither options here work:

#peft_model_outputs = peft_model .generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_outputs = peft_model.generate(inputs[“input_ids”], max_new_tokens=200,)
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

Here is the error I get:

peft_model_outputs = peft_model.generate(inputs[“input_ids”], max_new_tokens=200,)
Traceback (most recent call last):
File “”, line 1, in
TypeError: PeftModelForSeq2SeqLM.generate() takes 1 positional argument but 2 were given
generation_config = GenerationConfig(max_new_tokens=200, num_beams=1)
peft_model_outputs = peft_model.generate(inputs[“input_ids”], generation_config)
Traceback (most recent call last):
File “”, line 1, in
TypeError: PeftModelForSeq2SeqLM.generate() takes 1 positional argument but 3 were given


Hello @Brusk

Can I know why are using this everywhere to your left_model?? This is actually not required.

Incorrect checkpoint. CORRECT CHECKPOINT: ‘./peft-dialogue-summary-checkpoint-from-s3/’

  1. 3.3 - Evaluate the Model Qualitatively (Human Evaluation)
    Make inferences for the same example as in sections [1.3], with the original model, fully fine-tuned and PEFT model.

prompt = f"“”
Summarize the following conversation.


Summary: “”"

  1. **STOP using .to(torch.device(“cud:0”)), also incorrect way to tokenize

Correct way to tokenize is tokenizer(prompt, return_tensors=“pt”).input_ids

Incorrect way to generate original model. It should contain input_ids as first parameter and then generation_config=GenerationConfig(max_new_tokens=200, num_beams=1)

  1. Same issue with the instruct_model and left_model to generate model code you need (input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1)


If I don’t, I get the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper__index_select)

‘./peft-dialogue-summary-checkpoint-local’, is the model path of the model I saved locally after running the peft.trainer.train() of the previous section. It should work just as the checkpoint from s3.

They are the same. This is not an issue. It works with all other models.

4. 5. 6.
This way, it works for the original model:

input_ids = tokenizer(prompt, return_tensors=“pt”)
inputs =“cuda:0”))
original_model.generate(inputs[“input_ids”], max_new_tokens=200,)

This way it also works for the original model:

input_ids = tokenizer(prompt, return_tensors=“pt”).input_ids
input_ids =“cuda:0”))
original_model.generate(input_ids, GenerationConfig(max_new_tokens=200, num_beams=1))

However, neither of those ways work for the peft_model (why do you call it 'left model?!). For either way, I get a similar error:

TypeError: PeftModelForSeq2SeqLM.generate() takes 1 positional argument but 3 were given

Just out of curiosity, I tried to download other peft dialogue summary checkpoint, and I still get the same issue.

I found a similar post here.

The workaround would be naming the argument, peft_model.generate(input_ids=inputs.input_ids).

The problem is that PeftModelForSeq2SeqLM.generate currently does not pass along positional arguments to the base model.

This looks deliberate as there’s a test case validating it, presumably because the seq2seq variant makes use of the arguments by plucking them from the **kwargs, which would not work if they’re passed as positional arguments.

input_ids is often passed as a positional argument in the Transformers documentation, so the current API contradicts expectations.

What parameters in lora_config and/or training_arguments can be changed to generate a better model ( like the one in S3 checkpoint ) ?

i have tried reducing the learning rate, increase the epochs and logging_steps, max steps in training_arguments
and Rank in lora_config – but the models have much lower Rouge score than S3 checkpoint


1 Like