Nari Labs launches Dia, an open text-to-speech generator

Subscribe for free access to :arrow_forward: Data Points!

Nari Labs, a two-person startup, released Dia, a 1.6 billion parameter text-to-speech model that generates naturalistic dialogue directly from text prompts. The model supports advanced features like emotional tone, speaker tagging, and nonverbal audio cues such as laughs and coughs — capabilities that co-creator Toby Kim claims surpass competing offerings from ElevenLabs and Google’s NotebookLM. Side-by-side comparisons show Dia handling natural timing, nonverbal expressions, and emotional range quite effectively, with examples demonstrating how it properly interprets cues that other models simply read aloud or skip entirely. The model is available under an Apache 2.0 license, allowing commercial use while running on consumer-grade GPUs with about 10GB of VRAM. (GitHub)

2 Likes