arxiv
PublishedMay 22, 2026 at 4:00 AM
—neutral
WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Publisher summary· verbatim
arXiv:2512.09472v2 Announce Type: replace-cross Abstract: Deploying multiple models within shared GPU clusters is a key strategy to improve resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems improve GPU utilization at the cost of degraded inference performa
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Noise to Control: Parameterized Diffusion Policies4harxivMesh Field Theory: Port-Hamiltonian Formulation of Mesh-Based Physics4harxivCoupling Language Models with Physics-based Simulation for Synthesis of Inorganic Materials4harxivVESTA: Visual Exploration with Statistical Tool Agents4hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗