arxiv5d agobullish
arXiv:2606.05950v1 Announce Type: new Abstract: Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foundation models. However, most existing methods remain confined to single-turn settings, overlooking the more realistic scenario of multi-turn in-context edit
arxivMay 29
arXiv:2507.16880v3 Announce Type: replace-cross Abstract: Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Rec
arxivMay 28
arXiv:2605.28304v1 Announce Type: new Abstract: Composing autoregressive models remains a core challenge in understanding how large language models can combine behaviors or skills learned across tasks. We introduce a new and principled composition strategy for autoregressive systems, inspired by com
arxivMay 7bullish
arXiv:2505.16527v3 Announce Type: replace Abstract: Building generative models for relational databases (RDBs) is important for many applications, such as privacy-preserving data release and augmenting real datasets. However, most prior works either focus on single-table generation or adapt single-t
arxivApr 30bullish
arXiv:2604.26281v1 Announce Type: cross Abstract: To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard prosody for privacy or lack a principled mechanism
arxivApr 24
arXiv:2604.16565v2 Announce Type: replace-cross Abstract: While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geome
arxivApr 24bullish
arXiv:2604.17656v2 Announce Type: replace-cross Abstract: Video-to-music (V2M) is the fundamental task of creating background music for an input video. Recent V2M models achieve audiovisual alignment by typically relying on visual conditioning alone and provide limited semantic and stylistic control
arxivApr 16
arXiv:2512.10877v4 Announce Type: replace Abstract: Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically depends on large training datasets, challenging th