Tech & Science

OpenAI launches three real-time voice models with GPT-5-class reasoning

OpenAI on Thursday released three new audio models through its Realtime API, marking a push to make voice-powered applications smarter, more multilingual, ...

OpenAI on Thursday released three new audio models through its Realtime API, marking a push to make voice-powered applications smarter, more multilingual, and easier for developers to build. The trio — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — collectively addresses reasoning, translation, and transcription in live voice interactions.

OpenAI launches three real-time voice models with GPT-5-class reasoning

GPT-5-Class Reasoning Comes to Voice

GPT-Realtime-2 is the headline release, described by OpenAI as “our most intelligent voice model yet” and the company’s first voice model with GPT-5-class reasoning capabilities. The model features a 128,000-token context window, quadrupling the 32,000-token limit of its predecessor GPT-Realtime-1.5, and supports variable reasoning levels from minimal to high. On audio benchmarks, OpenAI says GPT-Realtime-2 scored roughly 15 percent higher on Big Bench than GPT-Realtime-1.5, which launched in February.

OpenAI framed the model as a shift from scripted voice bots to “real-time collaborators that can listen, reason, and solve complex problems as conversations unfold.”

Translation and Transcription at Scale

GPT-Realtime-Translate handles live speech translation from more than 70 input languages into 13 output languages, keeping pace with the speaker in real time. GPT-Realtime-Whisper provides streaming speech-to-text transcription with controllable latency — lower delay settings produce earlier partial text, while higher settings improve accuracy.

Pricing for GPT-Realtime-2 starts at $32 per million audio input tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.

Early Adopters Report Results

Several companies participated in early testing. Zillow reported a 26-point improvement in call success rates using GPT-Realtime-2, reaching 95 percent compared to 69 percent with earlier models. BolnaAI noted a 12.5 percent reduction in word error rates when evaluating GPT-Realtime-Translate for Hindi, Tamil, and Telugu.

OpenAI said the API includes safety protocols such as real-time classifiers to terminate conversations that violate content standards, and that the service complies with EU data residency regulations. The models are available immediately through OpenAI’s Realtime API.

Leave a Reply

Your email address will not be published. Required fields are marked *