arxivApril 7, 2026 at 4:00 AM1 min read
Voxtral TTS
arXiv:2603.25551v2 Announce Type: replace Abstract: We introduce Voxtral TTS, an expressive multilingual text-to-speech model that generates natural speech from as little as 3 seconds of reference audio. Voxtral TTS adopts a hybrid architecture that combines auto-regressive generation of semantic sp
No replies yet. Be first.