In lab3, the PPO model creation added 769 parameters to the PEFT model. They seem to be coming from the input units/layer of the PPO neural net. Are these units configurable and what do each represent?
Also is the output of the PEFT model fed to these input units? If yes, is the output of PEFT model equal to 769?