Sesame unveils expressive, context-aware speech system

Subscribe for free access to :arrow_forward: Data Points!

Sesame introduced the Conversational Speech Model (CSM), an end-to-end multimodal learning system designed to generate more natural and contextually appropriate AI speech. The model uses transformers to process both text and audio inputs, leveraging conversation history to produce coherent speech with improved expressivity and efficiency. Sesame’s work addresses limitations in current text-to-speech systems and aims to create AI companions with “voice presence” that can engage in genuine dialogue. The company released a demo and made its models available under an Apache 2.0 license. (Sesame)