arxiv
PublishedApril 27, 2026 at 4:00 AM
Beyond Linearity in Attention Projections: The Case for Nonlinear Queries
Publisher summary· verbatim
arXiv:2603.13381v2 Announce Type: replace-cross Abstract: Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity without noticeable performance deterioration. This is possible because attention depends on $X$ only through
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables11harxivConsequentialist Objectives and Catastrophe11harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms11harxivReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation11hOriginally published on arxiv ↗