arxivApril 11, 2026 at 4:00 AM2 min readneutral

Sensitivity-Positional Co-Localization in GQA Transformers

We investigate a fundamental structural question in Grouped Query Attention (GQA) transformers: do the layers most sensitive to task correctness coincide with the layers where positional encoding adaptation has the greatest leverage. We term this the co-localization hypothesis and test it on Llama 3.1 8B, a 32-layer GQA model with a 4:1 query-to-key-value head ratio. We introduce LSLORA, which restricts LoRA adaptation to layers identified via a novel correctness-differential hidden-state metric, and GARFA (GQA-Aware RoPE Frequency Adaptation), which attaches 8 learnable per-KV-head scalar multipliers to each targeted layer.

Contrary to the co-localization hypothesis, we discover strong anti-localization: task-sensitive layers concentrate in the late network (ℓ∈{23-31}) while RoPE-influential layers dominate the early network (ℓ∈{0-9}), yielding Spearman rs = -0.735 (p = 1.66×10^{-6}). Despite this anti-localization, a 4-way cross-layer ablation shows that applying both interventions to the sensitivity-identified layers outperforms all alternative configurations by 4-16 percentage points across six diverse benchmarks (MMLU, GPQA, HumanEval+, MATH, MGSM, ARC), approaching Claude 3.5 Haiku on HumanEval+ (67.1% vs. 68.3%) at $100 total compute cost.

The study provides insights into the structural properties of GQA transformers, with implications for the development of more efficient and effective models. The results are based on experiments conducted on the Llama 3.1 8B model, and the findings are reported in a paper available on arXiv, with the identifier arXiv:2604.07766. The paper is 8 pages long, includes 5 figures, and is categorized under Computation and Language (cs.CL), Artificial Intelligence (cs.AI), and Machine Learning (cs.LG).

Read original article ↗

No replies yet. Be first.

arxiv1h ago

FluidFlow: a flow-matching generative model for fluid dynamics surrogates on unstructured meshes

arxiv1h ago

AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs

arxiv1h ago

Sensitivity-Positional Co-Localization in GQA Transformers

Related Articles

FluidFlow: a flow-matching generative model for fluid dynamics surrogates on unstructured meshes

AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs

Mapping generative AI use in the human brain: divergent neural, academic, and mental health profiles of functional versus socio emotional AI use