{"id":38209,"date":"2025-09-02T08:32:12","date_gmt":"2025-09-02T08:32:12","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/38209\/"},"modified":"2025-09-02T08:32:12","modified_gmt":"2025-09-02T08:32:12","slug":"openais-latest-moves-put-many-voice-ai-startups-on-notice","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/38209\/","title":{"rendered":"OpenAI\u2019s Latest Moves Put Many Voice AI Startups on Notice"},"content":{"rendered":"<p>OpenAI has released its \u201cmost advanced speech-to-speech model yet\u201d: gpt-realtime.<\/p>\n<p>The AI giant also took the wraps off its Realtime API, now generally available with new capabilities as it moves out of beta.<\/p>\n<p>OpenAI hopes enterprises and developers will leverage both the model and the API to build \u201cproduction-ready voice agents\u201d.<\/p>\n<p>Some of the API\u2019s latest features will help here. For instance, the new ability of the Realtime API to support image inputs and remote MCP servers will make these agents more capable.<\/p>\n<p>Yet, there\u2019s also an exciting capability to develop better customer support voice AI agents.<\/p>\n<p>As <strong>Peter Bakkum, Member of Technical Staff at OpenAI<\/strong>, said in <a href=\"https:\/\/www.youtube.com\/watch?v=nfBbmtMJhX0\" rel=\"nofollow noopener\" target=\"_blank\">the announcement video<\/a>:<\/p>\n<blockquote>\n<p>We\u2019ve added support for SIP telephony, which makes it much easier to build applications for voice-over-phone situations like customer support.<\/p>\n<\/blockquote>\n<p>With this, a developer could easily grab a phone number from Twilio, feed that into the SIP interface provided by OpenAI, add prompts, feed it data, and let it go.<\/p>\n<p>As <strong>Andreas Granig, CEO at Sipfront, <\/strong>observed <a href=\"https:\/\/www.linkedin.com\/posts\/agranig_openai-just-sunk-half-of-the-voice-ai-activity-7367318851941793794-ccKs?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAB_MGuQB6bbmsMlLLNjrIhFAQlGGwLO5C-k\" rel=\"nofollow noopener\" target=\"_blank\">in a LinkedIn post<\/a>, that is quite the threat to many conversational AI startups.<\/p>\n<p>\u201cThere are quite some startups, who only provide an interface to the public phone network for existing speech-to-speech AI services, often without much telco moat, but relying mostly on Twilio and others\u2026 They are in hot water now,\u201d noted Granig.<\/p>\n<p>The CEO acknowledged that startups specializing in tool calling for advanced integrations remain safe, since that remains a specialist field. However, he added:<\/p>\n<blockquote>\n<p>The voice interface for AI assistants just became [a] commodity.<\/p>\n<\/blockquote>\n<p>As a result, it will be more difficult to differentiate use cases for AI assistants, signalling to many conversational AI startups that now is the time to step up.<\/p>\n<p>What About the New gpt-realtime Model?<\/p>\n<p>OpenAI hopes many customer support teams will leverage gpt-realtime, alongside the Realtime API, as they advance their customer support automation strategies.<\/p>\n<p>Indeed, as <strong>Peter Bakkum, Member of Technical Staff at OpenAI<\/strong>, said in <a href=\"https:\/\/www.youtube.com\/watch?v=nfBbmtMJhX0\" rel=\"nofollow noopener\" target=\"_blank\">the announcement video<\/a>:<\/p>\n<blockquote>\n<p>We carefully aligned the model\u2026 to real scenarios like customer support and academic tutoring.<\/p>\n<\/blockquote>\n<p>There are many reasons why support leaders would consider the gpt-realtime model. For starters, it enables AI agents that can understand and produce audio without relying on separate transcription, language, and voice models.<\/p>\n<p>Additionally, there are performance benefits. For instance, these agents will respond faster, as it\u2019s just one model, and capture subtleties like laughter or sighs while expressing various emotions.<\/p>\n<p>OpenAI also claims the model can deliver more natural, high-quality audio while following instructions across complex, multi-turn conversations.<\/p>\n<p>Developers can also adjust pace, tone, style, and even roleplay characters.<\/p>\n<p>Meanwhile, OpenAI claims the model can better handle unclear audio and long alphanumeric strings, like phone and license numbers. One study recently highlighted <a href=\"https:\/\/www.cxtoday.com\/contact-center\/contact-center-ai-assistants-are-introducing-new-inefficiencies-burdens-finds-study\/\" rel=\"nofollow noopener\" target=\"_blank\">these strings as a big problem<\/a> for rep-facing AI assistants leveraged in contact centers.<\/p>\n<p>However, despite all the model\u2019s advantages, there are cautions.<\/p>\n<p>For instance, its cost is relatively high at $32 \/ 1M audio input tokens ($0.40 for cached input tokens) and $64 \/ 1M audio output tokens.<\/p>\n<p>As such, <strong>Alex Levin, CEO at Regal<\/strong>, estimated that the cost of the speech-to-speech model is still approximately four times higher than chaining a speech-to-text (STT), large language model (LLM), text-to-speech (TTS) pipeline for Voice AI Agents.<\/p>\n<p>In a social post, the CEO also cautioned toward limited control over the model. He wrote:<\/p>\n<blockquote>\n<p>The Realtime model is missing the control\/observability that Voice AI Agent companies have in the \u201cchained\u201d model.<\/p>\n<\/blockquote>\n<p>\u201cAnd it\u2019s missing the ability to vary the model, voice, guardrails, etc, in each step of the conversation, which is currently easily achieved with a multi-state agent builder and a \u201cchained\u201d model today.\u201d<\/p>\n<p>Despite these concerns, some enterprises are working with OpenAI to start testing the model, including T-Mobile\u2026<\/p>\n<p>T-Mobile Uses gpt-realtime for Customer Conversations<\/p>\n<p>T-Mobile has tested OpenAI\u2019s models for six months and recently unlocked access to gpt-realtime. Together with the Realtime API, it claims to have already seen \u201chuge improvements\u201d.<\/p>\n<p>In the announcement video, <strong>Julianne Roberson, Director of AI at T-Mobile<\/strong>, highlighted how T-Mobile is already experimenting with the model to reimagine the device upgrade process, one of its most common demand drivers.<\/p>\n<p>During the demo, Roberson showed how the AI assistant guided a customer through selecting a phone under $300, checked compatibility with satellite services, and confirmed plan eligibility.<\/p>\n<p>In doing so, she emphasized that the model feels far more human, able to follow customers through unpredictable conversations while recognizing emotions and handling multimodal inputs.<\/p>\n<p>These multimodal capabilities will boost T-Mobile\u2019s objective to provide \u201cexpert-level service everywhere\u201d with AI.<\/p>\n<p>Given its close ties to OpenAI, it will be fascinating to see how this partnership develops, and whether T-Mobile shares CEO Sam Altman\u2019s prediction of <a href=\"https:\/\/www.cxtoday.com\/conversational-ai\/totally-totally-gone-openai-ceo-sam-altman-predicts-the-end-of-human-customer-service\/\" target=\"_blank\" rel=\"noopener nofollow\">the end of human customer service<\/a>.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"OpenAI has released its \u201cmost advanced speech-to-speech model yet\u201d: gpt-realtime. The AI giant also took the wraps off&hellip;\n","protected":false},"author":2,"featured_media":38210,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[261],"tags":[291,8135,289,290,2235,5059,18,19,17,82,8137,29443],"class_list":{"0":"post-38209","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-ai-agents","10":"tag-artificial-intelligence","11":"tag-artificialintelligence","12":"tag-automation","13":"tag-chatbots","14":"tag-eire","15":"tag-ie","16":"tag-ireland","17":"tag-technology","18":"tag-virtual-agent","19":"tag-virtual-assistant"},"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/38209","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=38209"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/38209\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/38210"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=38209"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=38209"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=38209"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}