arxiv
PublishedMay 12, 2026 at 4:00 AM
—neutral
FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning
Publisher summary· verbatim
arXiv:2605.09932v1 Announce Type: new Abstract: Large language models can now process increasingly long inputs, yet their ability to effectively use information spread across long contexts remains limited. We trace this gap to how attention budget is spent during supervised fine-tuning (SFT) on long
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivBiWM: Advancing Open-Source Interactive Video World Models with Bidirectional Autoregression17harxivFisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning17harxivIntegral Field Unit Spectroscopy with One Fiber17harxivAMEL: Accumulated Message Effects on LLM Judgments17hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗