Lab 3, 2.2 Reward Model

How can I extend the reward model beyond checking for toxicity in the generated output to other specific criteria or options? In lab 3 2.2, the approach used is sentiment analysis using “Meta AI’s RoBERTa-based hate speech model” which gives a higher reward if there is higher a chance of getting class nothate as an output.