arxivMay 16bullish
arXiv:2605.14773v1 Announce Type: cross Abstract: Data selection accelerates training by identifying representative training data while preserving model performance. However, existing methods mainly focus on designing sample-importance criteria, i.e., deciding what to select, while typically fixing
arxivApr 29
arXiv:2407.14974v2 Announce Type: replace-cross Abstract: Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correl
arxivApr 6
arXiv:2602.16967v3 Announce Type: replace Abstract: Grokking -- the abrupt transition from memorization to generalization after prolonged training -- has been linked to confinement on low-dimensional execution manifolds in modular arithmetic. Whether this mechanism extends beyond arithmetic remains
arxivApr 6
arXiv:2602.16746v3 Announce Type: replace Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention
arxivApr 3
arXiv:2603.27134v2 Announce Type: replace Abstract: Are there still barriers to generalization once all relevant variables are known? We address this question via a framework that casts compositional generalization as a variational inference problem over latent variables with parametric interactions