TL;DR

Voice Clone Launch: xAI on May 2 launched Custom Voices, cloning a user’s voice from about a minute of speech in under two minutes. Two-Stage Gate: A live passphrase plus speaker-embedding match must pass, which xAI says blocks cloning a third party from a pre-existing recording. Free With Grok 4.3: The feature ships free on the xAI console alongside Grok 4.3, sharing TTS and voice agent APIs with the 80+ preset voices. Unverified Safeguard: xAI has not published false-acceptance rates, anti-spoofing measures, or red-team results, leaving the impossibility claim untested by outside researchers.

xAI on May 2 introduced Custom Voices, a feature that clones a user’s voice from about a minute of natural speech in the xAI console, delivers a production-ready voice model in under two minutes, and runs across Grok text to speech and voice agent APIs at no cost. Activation is gated behind a two-stage verification process that xAI says is intended to make cloning a third party’s voice without their participation impossible.

Custom Voices is free for users on the xAI console and sits next to an existing library of 80+ preset voices spanning 28 languages, all callable through one TTS and voice-agent endpoint surface. Cloning ships as part of the Grok 4.3 release, which xAI is positioning at a low price point alongside a faster voice stack, and lands inside a crowded voice-cloning category where rival models already advertise much shorter input thresholds.

How Custom Voices Clones and Verifies

Cloning starts with a short recording captured through the xAI console. About a minute of natural speech is enough to produce a personalized model, and xAI says a clone is ready in under two minutes once a sample is submitted. Developers who already wired Grok speech into a product can call a cloned voice through the same endpoints that drive preset TTS voices and voice-agent flows, with no separate integration path and no new credential surface. Swapping in a personal clone is a routing change rather than a rebuild: a single voice identifier replaces another in the same request shape.

Voice Cloning is now live via the xAI API!

Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more.https://t.co/EjxjXssQtd pic.twitter.com/iR8AW2UOgo

— xAI (@xai) May 1, 2026

 

Activation runs through a two-stage verification process. Step one is a passphrase read aloud, which xAI’s STT engine transcribes and matches in real time to confirm consent and presence; step two compares speaker embeddings from the passphrase and the full recording to confirm they belong to the same person. xAI says that consent-and-presence check, requiring a real-time passphrase from the same speaker who supplied the recording, means you can’t clone someone else’s voice from a pre-existing recording. Failing either step blocks activation, moving enforcement off a static consent toggle and onto an audio comparison done at the moment a clone goes live.

xAI’s assertion remains a vendor claim, not an independently verified property. False-acceptance rates, anti-spoofing measures, and red-team results behind the gate have not been published, and liveness checks in adjacent voice and biometric products have been bypassed before with synthesized passphrases or replayed audio. Any safeguard sitting inside the broader voice-cloning ethics debate will be judged on how it holds up to those attacks, not on a launch description, and on whether xAI publishes evaluation data an outside researcher can reproduce.

Free on the Console, Inside the Grok Speech Stack

xAI says there is no extra charge to use text to speech or voice agent APIs with Custom Voices on the xAI console. That contrasts with the per-minute or per-character cloning fees that gate comparable features at several rivals. Personal clones, preset voices, and the speech-to-text side of the pipeline share one endpoint surface, which collapses the build cost of mixing a custom voice into a Grok-driven assistant or content workflow and lowers the bar for small teams that would not separately license a cloning vendor. Pairing a custom voice with the existing 28-language preset catalog also lets one application route between a personal voice and a localized preset voice without rewiring its audio stack.

Cloning is also inseparable from xAI’s Grok 4.3 release. Bundling cloning into a cheaper Grok tier means a developer choosing Grok for cost reasons inherits the voice features by default, which raises the volume of cloned voices likely to flow through Grok-driven products in the months after launch and pushes the consent-and-liveness gate from an edge case into a routine path.

Where the Category Is Heading

Input thresholds across the category keep falling. Alibaba’s Qwen3-TTS advertises usable cloning from about three seconds of audio, and Microsoft has shipped consumer cloning inside Teams with consent-style controls and Azure’s Personal Voice. xAI’s minute-long sample is more conservative than the shortest peer offers, but a more durable signal is that consent and liveness checks are becoming default packaging for cloning features rather than an optional add-on bolted on after launch.

The harder test is whether vendor-imposed safeguards keep pace with misuse capability, and the gap is concrete: Alibaba’s Qwen3-TTS clones from 3 seconds of audio against xAI’s 60-second floor, while xAI has so far withheld the false-acceptance rate behind its two-stage gate, the anti-spoofing measures, and any red-team results from the May 2 Grok 4.3 launch. Until an independent security-researcher reproduction runs synthesized passphrases and replayed audio against the live console and xAI publishes a regulator-readable evaluation set, the impossibility claim remains a launch-page assertion rather than a verified property of Grok 4.3’s voice stack.