Sesame unveils expressive, context-aware speech system

Community-Team · March 7, 2025, 8:50pm

Subscribe for free access to Data Points!

Sesame introduced the Conversational Speech Model (CSM), an end-to-end multimodal learning system designed to generate more natural and contextually appropriate AI speech. The model uses transformers to process both text and audio inputs, leveraging conversation history to produce coherent speech with improved expressivity and efficiency. Sesame’s work addresses limitations in current text-to-speech systems and aims to create AI companions with “voice presence” that can engage in genuine dialogue. The company released a demo and made its models available under an Apache 2.0 license. (Sesame)

Topic		Replies	Views
The Sound of Conversation: AI Learns to Mimic Conversational Pauses and Interruptions AI Discussions the-batch , ai-discussions	1	57	May 20, 2023
Glad to be here! Generative AI with Large Language Models week-1	1	421	July 25, 2023
Speech to Speech system AI Discussions ai-discussions	3	62	January 25, 2025
Nari Labs launches Dia, an open text-to-speech generator AI Discussions ai-discussions , data-points	3	214	April 30, 2025
He Who Types the Prompt Calls the Tune: Google introduces an AI that generates music from text AI Discussions the-batch , ai-discussions	1	77	May 20, 2023

Sesame unveils expressive, context-aware speech system

Related topics