arxivJun 12bullish

HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue

arXiv:2606.13142v1 Announce Type: new Abstract: Persona-grounded dialogue systems aim to produce responses consistent with a speaker's persona, yet existing methods treat personas as a flat set of sentences and fail to model the high-order relations among persona attributes-e.g., that several person

GPMEQW4 models · +1 #persona-grounded #dialogue

arxivMay 21

Features have life history. And we should care

arXiv:2605.18789v1 Announce Type: cross Abstract: Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find evidence of a persistent representational backbone, which we identify in Pythia-160M

PYPY2 models #language models #training dynamics #neural networks Read on arxiv →

arxivMay 21bullish

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

arXiv:2605.19781v1 Announce Type: new Abstract: Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update

MUSGAD5 models · +2 #optimization #deep learning #neural networks Read on arxiv →

arxivApr 24

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

arXiv:2604.21836v1 Announce Type: cross Abstract: Neural networks exhibit a remarkable degree of representational convergence across diverse architectures, training objectives, and even data modalities. This convergence is predictive of alignment with brain representation. A recent hypothesis sugges

DI1 model #neural networks #representational convergence #cross-modal alignment Read on arxiv →