Subscribe for free access to
Data Points!
OpenAI launched “gpt-realtime,” a new speech-to-speech model that processes audio directly through a single model rather than chaining multiple models together, achieving 82.8 percent accuracy on Big Bench Audio benchmarks (versus 65.6 percent for the previous version). The model also shows significant improvements in instruction following, function calling accuracy, and better understands non-verbal cues and language switching. OpenAI also made its Realtime API generally available with new features including remote MCP server support, image inputs, and phone calling. These releases enable developers to build production-ready voice agents that sound more human and handle complex tasks more reliably for fields such as customer support, personal assistance, and education. The new model costs $32 per 1 million audio input tokens and $64 per 1 million audio output tokens, a 20 percent reduction from earlier pricing. (OpenAI)
