Tag

#attention

3 articles tagged #attention

arxivMay 16bullish

Krause Synchronization Transformers

arXiv:2602.11534v3 Announce Type: replace-cross Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor

MEQWVI3 models #transformers #attention #efficiency Read on arxiv →

arxivMay 16bullish

MHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMs

arXiv:2605.14966v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) have achieved remarkable performance across diverse multimodal tasks, yet they continue to suffer from hallucinations, generating content that is inconsistent with the visual input. Prior work DHCP (Detecting Hall

MHDH2 models #hallucination #mitigation #multimodal Read on arxiv →

arxivMay 14bullish

ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection

arXiv:2605.12879v1 Announce Type: new Abstract: Doubly-stochastic attention has emerged as a transport-based alternative to row-softmax attention, with recent Transformer variants using it to reduce attention sinks and rank collapse while improving performance. In this family, the standard approach

SIAS2 models #transformer #attention #machine-learning Read on arxiv →

Tag

#attention

3 articles tagged #attention

arxivMay 16bullish