Request for Telugu-English Code-Switched Speech Corpus (ASR Project)

Hi everyone,

I’m a 3rd-year B.Tech student working on an Automatic Speech Recognition (ASR) system for Telugu-English code-switched speech.

While researching, I came across a paper describing a large-scale Telugu-English code-switched corpus (200 hours, 400 speakers with detailed annotations like POS tags, language tags, and code-switch points).

I’ve already reached out to the author for access, but I wanted to ask here:

  • Has anyone worked with Telugu-English code-switched datasets?

  • Are there any publicly available corpora or alternatives I can use?

  • Any suggestions for handling code-switching in ASR models (especially data scarcity)?

Would really appreciate any guidance or resources.

Thanks!