Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2512.22671v2 Announce Type: replace Abstract: Structured width pruning of GLU-MLP layers, guided by the Maximum Absolute Weight (MAW) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parame

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

Related coverage

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

Related coverage