DataBubble·

Model Detail

audio-flamingo-next-hf

▲ 5.3%

Provider: NVIDIACategory: multimodalPipeline: audio-text-to-text

DB Score

0.0

Downloads

Likes

Day

+5.3%

Week

+0.0%

Month

+0.0%

Download History

Research Paper

arXiv: 2604.10905→

Model Info

Licenseother

Recent newsView all news →

Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues

arXiv:2604.22043v1 Announce Type: cross Abstract: Background: The classroom discourse analysis has been transformed by the growing use of audio-video multimodal data, which demands analytical methods that balance interpretive depth with computational scalability. Methods: This study introduces the A

arxivneutral3d ago

AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA

arXiv:2604.21766v1 Announce Type: new Abstract: Existing audio question answering benchmarks largely emphasize sound event classification or caption-grounded queries, often enabling models to succeed through shortcut strategies, short-duration cues, lexical priors, dataset-specific biases, or even b

arxivneutral3d ago

Misinformation Span Detection in Videos via Audio Transcripts

arXiv:2604.21767v1 Announce Type: new Abstract: Online misinformation is one of the most challenging issues lately, yielding severe consequences, including political polarization, attacks on democracy, and public health risks. Misinformation manifests in any platform with a large user base, includin

arxivneutral4d ago

ATIR: Towards Audio-Text Interleaved Contextual Retrieval

arXiv:2604.20267v1 Announce Type: cross Abstract: Audio carries richer information than text, including emotion, speaker traits, and environmental context, while also enabling lower-latency processing compared to speech-to-text pipelines. However, recent multimodal information retrieval research has

arxivneutral4d ago

KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness

arXiv:2604.19782v1 Announce Type: cross Abstract: Recent advances in large audio language models (LALMs) have enabled multilingual speech understanding. However, benchmarks for evaluating LALMs remain scarce for non-English languages, with Korean being one such underexplored case. In this paper, we

arxivneutral5d ago

DASB - Discrete Audio and Speech Benchmark

arXiv:2406.14294v4 Announce Type: replace-cross Abstract: Discrete audio tokens have recently gained considerable attention for their potential to bridge audio and language processing, enabling multimodal language models that can both generate and understand audio. However, preserving key informatio

Related Models

Gemma-4-31B-IT-NVFP4

NVIDIA · 1.8M downloads

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

NVIDIA · 1.6M downloads

Qwen3.5-9B

Qwen · 7.2M downloads

gemma-4-31B-it

Google · 6.3M downloads