State of art performance after data augmentation techniques upon model saturation in NLP text summarization

I’d like to share a real world example we have reached state of art performance in our main metric was reached using data centricity approaches were introducing in my team’s project at Stanford University NLP with Deep Learning class.
We augmented radiology reports by generating a sort of “paraphrasing with scale” our large corpus by shuffling the reports fields ordering when fine-tuning transformers in additional epochs.

You can watch our short presentation in this video, and if you would like to dig deeper feel free to see our published report in the course webpage. Also feel free to PM directly.

Cheers!