CTC model clarification

In Week 3, ‘Speech recognition’ Prof Ng mentions that “CTC models and attention models work for audio data” around 8:20.

As I understand it, CTC is a way to calculate loss for situations where Tx is vastly larger than Ty, by allowing for ‘blank’ characters. When Prof Ng mentions ‘CTC models’ I assume he means any RNN based models utilizing the CTC loss and not that CTC has it’s own style of architecture (like U-nets have that U-shape) right?

Thank you!

I agree. Andrew discusses this at around 5:21.
CTC allows for not only blank characters, but also for duplicate characters.