In initializing the PPOConfig as shown below in the lab3, are we passing in the pre-trained model, peft model, or the reference model. Can’t be reference model.
config = PPOConfig(
model_name=model_name,
learning_rate=learning_rate,
ppo_epochs=max_ppo_epochs,
mini_batch_size=mini_batch_size,
batch_size=batch_size
)