Hey there!
I’m interested in adding punctuations to raw YouTube transcripts and developed a proof of concept web app and iOS app here:
AppBlit DOT com/scribe
For now it only works on English transcripts but it seems quite good already.
The neural net is a DistilBert token classifier that was converted to ONNX for inference in the browser using TransformersJS from HuggingFace ![]()
I would love to make it work in more languages.
Also would like to start summarizing the transcripts and possibly detect speaker turns: it’d add a lot of value when skimming the text, what do you think?
