Tag

#speech-processing

6 articles tagged #speech-processing

arxivJul 16bullish

Hybrid Continual Learning for Low-Resource Australian Aboriginal Language Identification

arXiv:2607.11946v1 Announce Type: new Abstract: Language identification is an important step toward integrating endangered Australian Aboriginal languages (AALs) into speech technologies supporting language revitalisation and digital inclusion. However, extreme data scarcity limits model performance

#language-identification #speech-processing #continual-learning Read on arxiv →

arxivJul 3

Quantifying the Uncertainty of Blindly Estimated Room Embeddings Using a Dispersion-Calibrated Score

arXiv:2607.01527v1 Announce Type: cross Abstract: Room embeddings derived from reverberant speech are often unreliable: speech content and recording degradation can alter the representation even when speaker, room, and source-receiver geometry remain unchanged, degrading downstream task performance.

#speech-processing #machine-learning #audio Read on arxiv →

arxivJun 17

Fast Speech Foundation Model Distillation Using Interleaved Stacking

arXiv:2606.11766v2 Announce Type: replace-cross Abstract: Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. Ho

#speech-processing #model-distillation #training-acceleration Read on arxiv →

arxivMay 8bullish

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

arXiv:2605.05927v1 Announce Type: new Abstract: Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech ge

TEWH2 models #speech-processing #language-models #modality-gap Read on arxiv →

arxivMay 7

Deepfake Audio Detection Using Self-supervised Fusion Representations

arXiv:2605.03420v1 Announce Type: cross Abstract: This paper describes a submission to the Environment-Aware Speech and Sound Deepfake Detection Challenge (ESDD2) 2026, which addresses component-level deepfake detection using the CompSpoofV2 dataset, where speech and environmental sounds may be inde

FABEAA3 models #deepfake-detection #speech-processing #environmental-sounds Read on arxiv →

arxivApr 24bullish

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

arXiv:2305.01626v4 Announce Type: replace-cross Abstract: Computational models of syntax are predominantly text-based. Here we propose that the most basic first step in the evolution of syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous

CIFICN3 models #speech-processing #neural-networks #language-modeling Read on arxiv →

Tag

#speech-processing

6 articles tagged #speech-processing

arxivJul 16bullish