PPO model parameters

In lab3, the PPO model creation added 769 parameters to the PEFT model. They seem to be coming from the input units/layer of the PPO neural net. Are these units configurable and what do each represent?

Also is the output of the PEFT model fed to these input units? If yes, is the output of PEFT model equal to 769?

If think these are the parameters of the PPO model which is a separate model than PEFT. The output of the PPO should be fed to PEFT to align its proper direction.

Thanks! It would be interesting to understand a little more about PPO.