FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2605.09932v1 Announce Type: new Abstract: Large language models can now process increasingly long inputs, yet their ability to effectively use information spread across long contexts remains limited. We trace this gap to how attention budget is spent during supervised fine-tuning (SFT) on long

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Related coverage

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Related coverage