arxiv
PublishedJuly 1, 2026 at 4:00 AM
Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining
Publisher summary· verbatim
arXiv:2509.10406v4 Announce Type: replace Abstract: Pretraining transformers on long sequences (entire code repositories, collections of related documents) is bottlenecked by quadratic attention costs. We present Multipole Semantic Attention (MuSe), which accelerates 64k-context pretraining by 36% w
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Idea to Prototype in an Afternoon: Scaffolded, AI-Assisted Rapid VA Prototyping7harxiv3D HAMSTER: Bridging Planning and Control in Hierarchical Vision Language Action Models through 3D Trajectory Guidance7harxivImproving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression7harxivPaper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance7hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗