PPO Config model parameter (which model)

In initializing the PPOConfig as shown below in the lab3, are we passing in the pre-trained model, peft model, or the reference model. Can’t be reference model.

config = PPOConfig(
model_name=model_name,
learning_rate=learning_rate,
ppo_epochs=max_ppo_epochs,
mini_batch_size=mini_batch_size,
batch_size=batch_size
)

1 Like

What do you mean?

1 Like

I mean, when we pass in model_name to the PPOConfig is it using the original pretrained google flan-t5 model or the one that has been fined tuned in the lab (PEFT model)?

model_name=“google/flan-t5-base”, that what I picked up there!