arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
The Origin of Edge of Stability
Publisher summary· verbatim
arXiv:2604.20446v1 Announce Type: new Abstract: Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold $2/\eta$, where $\eta$ is the learning rate. This phenomenon, the Edge of Stability, has resisted a unified explanation: existing accounts establish s
Discussion
No replies yet. Be first.
Originally published on arxiv ↗