Architecture of the preference model and reward model in Constitutional AI

In Constitutional AI (W3 lecture slide 68), what is the architecture of the preference model that is used for response, critique and revision? Is it a Transformer model or not? And what is the achitecture of the reward model?