Fine-Tune FLAN-T5 with Reinforcement Learning (PPO) and PEFT to Generate Less-Toxic Summaries week #3

When I am fine tuning the peft model to generate less toxic summarie I am getting error in the code. I tried to change the code little bit but still getting error as ‘AutoModelForSeq2SeqLMWithValueHead’ object has no attribute ‘base_model_prefix’. BelowI am adding the SS of the error

1 Like

Hello, have you added comments and additional coding lines to the Lab? Because I don’t see these over there! From what I see, the parameters inside PPOconfig and PPOTrainer are not set up properly, plus there is no value_model argument in the PPOTrainer: