{"id":195340,"date":"2025-09-02T23:34:10","date_gmt":"2025-09-02T23:34:10","guid":{"rendered":"https:\/\/www.europesays.com\/us\/195340\/"},"modified":"2025-09-02T23:34:10","modified_gmt":"2025-09-02T23:34:10","slug":"microsoft-unveils-vibevoice-for-longer-conversational-ai-audio","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/195340\/","title":{"rendered":"Microsoft Unveils VibeVoice for Longer Conversational AI Audio"},"content":{"rendered":"<p><a href=\"https:\/\/www.microsoft.com\/en-us\/\" target=\"_blank\" rel=\"noopener nofollow\">Microsoft<\/a> has <a class=\"editor-rtfLink\" href=\"https:\/\/huggingface.co\/microsoft\/VibeVoice-1.5B\" target=\"_blank\" rel=\"noopener nofollow\">released<\/a> VibeVoice, a new open-source artificial intelligence (AI) model that lets users create podcasts and other audio \u2014 a counter to <a href=\"https:\/\/www.google.com\/\" target=\"_blank\" rel=\"noopener nofollow\">Google<\/a>\u2019s popular <a class=\"editor-rtfLink\" href=\"https:\/\/blog.google\/technology\/ai\/notebooklm-audio-overviews\/\" target=\"_blank\" rel=\"noopener nofollow\">NotebookLM<\/a>.<\/p>\n<p>But there are notable differences. Microsoft\u2019s text-to-speech model can generate four voices and up to 90 minutes of podcast-quality speech. NotebookLM can do two voices.<\/p>\n<p>Additionally, VibeVoice reads and organizes text while NotebookLM ingests documents and turns them into two-person podcasts. Users can also query and get document summaries, according to tech firm <a class=\"editor-rtfLink\" href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noopener nofollow\">Hugging Face<\/a>.<\/p>\n<p>That means VibeVoice doesn\u2019t try to understand the text but rather performs it audibly, ostensibly to replace a recording studio.<\/p>\n<p>VibeVoice is the latest offering in voice AI technology, which has been attracting venture capital funding. <\/p>\n<p>In 2024, voice AI startups <a class=\"editor-rtfLink\" href=\"https:\/\/www.cbinsights.com\/research\/voice-ai-market-opportunities\/\" target=\"_blank\" rel=\"noopener nofollow\">raised<\/a> $2.1 billion, up eightfold from the prior year, according to market research firm <a href=\"https:\/\/www.cbinsights.com\/\" target=\"_blank\" rel=\"noopener nofollow\">CB Insights<\/a>. There\u2019s rising interest in voice shopping: A <a class=\"editor-rtfLink\" href=\"https:\/\/www.pymnts.com\/voice-activation\/2024\/30percent-of-gen-z-consumers-shop-by-voice-every-week\/\" target=\"_blank\" rel=\"noopener nofollow\">PYMNTS Intelligence report<\/a> shows that 30.4% of Gen Z consumers already shop by voice every week, followed by millennials. For all ages, the average is 17.9% of consumers using voice to shop.<\/p>\n<p>VibeVoice runs on 1.5 billion parameters, relatively small for a model capable of sustaining dialogue across multiple speakers. <\/p>\n<p>It was trained using Alibaba\u2019s open-source Qwen2.5, a large language model that helps orchestrate natural turn-taking and contextually aware speech patterns during dialogues. <\/p>\n<p>Microsoft claims this means VibeVoice can produce fluid conversations among four voices and yet maintain each voice\u2019s distinct characteristics, even in longer conversations.<\/p>\n<p><strong>See also<\/strong>: <a class=\"editor-rtfLink\" href=\"https:\/\/www.pymnts.com\/consumer-insights\/2024\/new-report-how-the-world-does-digital-a-deep-dive-into-global-digital-engagement\/\" target=\"_blank\" rel=\"noopener nofollow\">How the World Does Digital: A Deep Dive Into Global Digital Engagement<\/a><\/p>\n<p><strong>How to use VibeVoice<\/strong><\/p>\n<p>Potential research applications of VibeVoice include the following:<\/p>\n<p>Prototyping podcasts and training content<\/p>\n<ul>\n<li> Creators could generate mock podcasts, panel discussions or training modules with multiple AI voices. Instead of hiring four voice actors to test dialogue flow, users can create a synthetic version in minutes using text.<\/li>\n<\/ul>\n<p>Accessibility and education<\/p>\n<ul>\n<li> Educational material, textbooks or research papers could be turned into long-form audio with distinct narrators. This could help people who learn better by listening, or make dense material more engaging.<\/li>\n<\/ul>\n<p>Game and media development<\/p>\n<ul>\n<li> Game developers or storytellers could use VibeVoice to prototype dialogue between characters. Because it handles four speakers, you can stage a full in-game conversation without recording sessions.<\/li>\n<\/ul>\n<p>Recognizing the risks of deepfakes, Microsoft said VibeVoice\u2019s safeguards include ensuring every audio file includes both a disclaimer\u2014such as \u201cThis segment was generated by AI\u201d\u2014and a hidden digital watermark.<\/p>\n<p>It bars impersonation, disinformation and live deepfake uses such as real-time voice conversion in calls. It supports only English and Chinese speech for now. The model is available for research, not commercial deployment.<\/p>\n<p><strong>Read more:\u00a0<\/strong><\/p>\n<p><a class=\"editor-rtfLink\" href=\"https:\/\/www.pymnts.com\/news\/artificial-intelligence\/2025\/nobodys-talking-voice-interfaces-face-hurdles-for-wide-adoption\/\" target=\"_blank\" rel=\"noopener nofollow\">Nobody\u2019s Talking: Voice Interfaces Face Hurdles for Wide Adoption<\/a><\/p>\n<p><a class=\"editor-rtfLink\" href=\"https:\/\/www.pymnts.com\/artificial-intelligence-2\/2025\/aws-and-vonage-partner-to-distribute-natural-sounding-ai-voice-agents\/\" target=\"_blank\" rel=\"noopener nofollow\">AWS and Vonage Partner to Distribute \u2018Natural-Sounding\u2019 AI Voice Agents<\/a><\/p>\n<p><a class=\"editor-rtfLink\" href=\"https:\/\/www.pymnts.com\/meta\/2025\/meta-buying-voice-ai-startup-playai\/\" target=\"_blank\" rel=\"noopener nofollow\">Meta to Make a Bid for Voice AI Startup PlayAI<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Microsoft has released VibeVoice, a new open-source artificial intelligence (AI) model that lets users create podcasts and other&hellip;\n","protected":false},"author":3,"featured_media":195341,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[691,738,252,50,751,158,67,132,68,108752,12736,17096],"class_list":{"0":"post-195340","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-microsoft","11":"tag-news","12":"tag-pymnts-news","13":"tag-technology","14":"tag-united-states","15":"tag-unitedstates","16":"tag-us","17":"tag-vibevoice","18":"tag-voice-ai","19":"tag-voicetech"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/115137325509162041","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/195340","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=195340"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/195340\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/195341"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=195340"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=195340"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=195340"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}