DataBubble·

Model Detail

OmniVoice

▼ 0.7%

Provider: k2-fsaCategory: audioPipeline: text-to-speechParameters: 0.6B

DB Score

3.2

Downloads

2.5M

Likes

979

Day

-0.7%

Week

+0.0%

Month

+21.9%

Overview

OmniVoice is an audio model with 0.6B parameters released by k2-fsa. The model is registered under the text-to-speech pipeline tag on Hugging Face, distributed under the permissive apache-2.0 license.

Technical

OmniVoice ships with 0.6B parameters. The apache-2.0 license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Trending Signal

Downloads of OmniVoice have moved -0.7% over the past 24 hours, +21.9% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

OmniVoice is best fit for speech recognition, transcription, or speech synthesis depending on the task head. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Research Paper

arXiv: 2604.00688→

Model Info

Licenseapache-2.0

Citations3 (0 influential)

Recent newsView all news →

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

arXiv:2604.00688v3 Announce Type: replace Abstract: We present OmniVoice, a massively multilingual zero-shot text-to-speech (TTS) model that scales to over 600 languages. At its core is a novel diffusion language model-style discrete non-autoregressive (NAR) architecture. Unlike conventional discret

Related Models

Kokoro-82M

hexgrad · 13.8M downloads

XTTS-v2

coqui · 10.0M downloads