Generative AI with Large Language Models course
Week 3
Lab 3 - Fine-tune FLAN-T5 with reinforcement learning to generate more-positive summaries
The training did not make the model more positive. Here is the result of the final part of section 2 - where we measure how toxic the un-tuned model is
11it [00:23, 2.18s/it]
toxicity [mean, std] before detox: [0.026959281965074213, 0.036199814536090356]
And then here are the results from section 3.3 where we measure how toxic the tuned model is, and how much of an improvement that is
11it [00:19, 1.73s/it]
toxicity [mean, std] after detox: [0.04409130260517651, 0.05958874047019413]
Percentage improvement of toxicity score after detoxification:
mean: -63.55%
std: -64.61%
ie the model has become MORE toxic
Have I done something wrong here?
Thanks,
Chris