arxivMay 21
arXiv:2605.18789v1 Announce Type: cross Abstract: Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find evidence of a persistent representational backbone, which we identify in Pythia-160M
arxivMay 21bullish
arXiv:2605.19781v1 Announce Type: new Abstract: Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update
arxivApr 24
arXiv:2604.21836v1 Announce Type: cross Abstract: Neural networks exhibit a remarkable degree of representational convergence across diverse architectures, training objectives, and even data modalities. This convergence is predictive of alignment with brain representation. A recent hypothesis sugges