Hi,
As you can see in the attached picture, the toxicity is increased from before (0.0378) and after detoxification (0.0430).
In this case, does it mean the model after detoxification actually performs worse?
Please kindly clarify. Thanks!
Hi,
As you can see in the attached picture, the toxicity is increased from before (0.0378) and after detoxification (0.0430).
In this case, does it mean the model after detoxification actually performs worse?
Please kindly clarify. Thanks!
Hi I am getting the following:
Percentage improvement of toxicity score after detoxification:
mean: 0.38%
std: 12.21%
This is with:
toxicity [mean, std] after detox: [0.02905736598503691, 0.029726363734658947]
Hello, I’m having similar results:
mean_before_detoxification = 0.03147564082279463
mean_after_detoxification = 0.0357071926118806
My guess would be that because the training process is quite short and we are using only a subset of data, the inherent stochasticity of this process can sometimes produce results that are not coherent with the expectations. I assume that if you would run this with more data and epochs, you would get “correct” results every time. But this is just a guess.
Thank you and it makes sense!
Yes, the decrease of mean toxicity noticeably small in my case and the standard deviation increased.
toxicity [mean, std] before detox: [0.035475403208031574, 0.03445820341137294]
toxicity [mean, std] after detox: [0.030459516253110698, 0.04322473559713335]
I would also guess that this has to do with the short amount of training time / epochs.
Hi. I got the same type of result and it is very likely due to the low number of training steps and rather non-toxic language as input.
Percentage improvement of toxicity score after detoxification:
mean: -23.40%
std: -33.33%
Thank you for the post.
I thought something is wrong with my reasoning.
thank you for clarification. I thought I didnt understood but at the end I got the right understanding