Model Detail
stable-diffusion-xl-base-1.0
▼ 0.5%SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation
arXiv:2604.09212v1 Announce Type: new Abstract: Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preserving consistent roles, personas, and goals across long horizons. This requirement becomes critical when
FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning
arXiv:2601.18150v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) for large language models (LLMs) is increasingly bottlenecked by rollout (generation), where long output sequence lengths make attention and KV-cache memory dominate end-to-end step time. FP8 offers an attractive l
Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits
arXiv:2604.08643v1 Announce Type: new Abstract: User interactions in online recommendation platforms create interdependencies among content creators: feedback on one creator's content influences the system's learning and, in turn, the exposure of other creators' contents. To analyze incentives in su
Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models
arXiv:2604.08964v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have recently become a promising alternative to autoregressive large language models (ARMs). Semi-autoregressive (Semi-AR) decoding is widely employed in base dLLMs and advanced decoding strategies due to its sup
Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
arXiv:2604.07888v1 Announce Type: new Abstract: Training LLMs at ultra-low precision remains a formidable challenge. Direct low-bit QAT often suffers from convergence instability and substantial training costs, exacerbated by quantization noise from heavy-tailed outlier channels and error accumulati
QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch
arXiv:2604.07853v1 Announce Type: new Abstract: Large language model (LLM) reinforcement learning (RL) pipelines are often bottlenecked by rollout generation, making end-to-end training slow. Recent work mitigates this by running rollouts with quantization to accelerate decoding, which is the most e