{"id":57801,"date":"2025-09-11T17:58:08","date_gmt":"2025-09-11T17:58:08","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/57801\/"},"modified":"2025-09-11T17:58:08","modified_gmt":"2025-09-11T17:58:08","slug":"how-google-build-real-time-language-translation-for-meet","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/57801\/","title":{"rendered":"How Google build real-time language translation for Meet"},"content":{"rendered":"<p data-block-key=\"zanyf\">Fredric, who leads the audio engineering team in Meet, has watched AI transform what his team is capable of doing. His team began working on Speech Translation about two years ago; at the time, existing models could handle offline translation, but the challenge lay in making it instantaneous \u2014 which would be necessary for live Google Meet calls. But they knew it was possible, so they began working with the Google DeepMind team. \u201cWhen we started, we thought, \u2018Maybe this will take five years,\u2019\u201d Fredric explains. Two years later, here we are. \u201cAs things go with AI,\u201d he explains, \u201cthings just went faster and faster. Now, there\u2019s a whole Google community with engineers from Pixel, Cloud, Chrome and more working together with Google Deepmind to achieve real-time speech translation.\u201d<\/p>\n<p>Breakthroughs in translation technology<\/p>\n<p data-block-key=\"aroc0\">Previous audio translation technologies relied on a multi-step process: Transcribe the speech, translate the text, then convert it back to speech. This chain resulted in significant latency, often 10-20 seconds, making natural conversation impossible. And translated voices were generic, failing to capture the speaker&#8217;s unique characteristics.<\/p>\n<p data-block-key=\"fep5h\">The true breakthrough, as Huib (who leads product management for audio quality) explains, was thanks to \u201clarge models\u201d \u2014 not necessarily large language models (LLMs) but models capable of &#8220;one-shot&#8221; translation. &#8220;You send audio in and almost immediately, the model starts outputting audio,&#8221; he notes. This drastically reduced latency to nearly mimic how a human interpreter processes and delivers speech. \u201cWe discovered that two to three seconds was sort of a sweet spot,\u201d Huib says. Faster was difficult to understand; slower didn\u2019t lend itself to natural conversation. But once they hit this timing, it meant that using this model, translation in Google Meet can make simultaneous conversation across different languages feasible.<\/p>\n<p>Problem solving and big improvements<\/p>\n<p data-block-key=\"f0ikl\">Developing this complex feature was not without its hurdles. One of the most critical aspects was ensuring high-quality translation, which can vary greatly depending on factors like speaker accent, background noise or network conditions. Despite challenges in development, the Meet and DeepMind teams worked together to refine these hiccups, testing models and adjusting them based on real-world performance.<\/p>\n<p data-block-key=\"2501s\">Part of that testing involved working with linguists and other language experts to really understand the nuances not only of translation but accents as well. Languages with closer affinities, like Spanish, Italian, Portuguese and French were easier to integrate, while structurally different languages such as German presented greater challenges due to variations in everything from grammar to common idioms. Currently, the model also translates most expressions literally, which can lead to amusing misunderstandings, Huib and Frederic note. However, they expect updates using advanced LLMs will grasp and translate such nuances more accurately, even capturing tone and irony.<\/p>\n","protected":false},"excerpt":{"rendered":"Fredric, who leads the audio engineering team in Meet, has watched AI transform what his team is capable&hellip;\n","protected":false},"author":2,"featured_media":57802,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[261],"tags":[291,289,290,18,19,17,1186,82],"class_list":{"0":"post-57801","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-eire","12":"tag-ie","13":"tag-ireland","14":"tag-none","15":"tag-technology"},"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/57801","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=57801"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/57801\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/57802"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=57801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=57801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=57801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}