Yes, the decrease of mean toxicity noticeably small in my case and the standard deviation increased.
toxicity [mean, std] before detox: [0.035475403208031574, 0.03445820341137294]
toxicity [mean, std] after detox: [0.030459516253110698, 0.04322473559713335]
I would also guess that this has to do with the short amount of training time / epochs.