arxiv2d ago
arXiv:2606.01802v2 Announce Type: replace-cross Abstract: MOSS-Audio is a unified audio-language model for speech, environmental sound, and music understanding, supporting audio captioning, time-aware question answering, timestamped transcription, and audio-grounded reasoning. MOSS-Audio couples a d
arxivMay 29
arXiv:2603.07916v2 Announce Type: replace Abstract: In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predi
arxivMay 29
arXiv:2605.29635v1 Announce Type: cross Abstract: In this paper, we study a structured class of nonconvex constrained stochastic problems with difference-of-convex (DC) regularization, where the feasible set is possibly nonconvex and the concave part of the DC regularizer is allowed to be nonsmooth.
arxivMay 26
arXiv:2605.22794v2 Announce Type: replace Abstract: Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all confine
arxivMar 31
arXiv:2603.28086v1 Announce Type: cross Abstract: Voice design from natural language aims to generate speaker timbres directly from free-form textual descriptions, allowing users to create voices tailored to specific roles, personalities, and emotions. Such controllable voice creation benefits a wid