arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
SGD at the Edge of Stability: The Stochastic Sharpness Gap
Publisher summary· verbatim
arXiv:2604.21016v1 Announce Type: cross Abstract: When training neural networks with full-batch gradient descent (GD) and step size $\eta$, the largest eigenvalue of the Hessian -- the sharpness $S(\boldsymbol{\theta})$ -- rises to $2/\eta$ and hovers there, a phenomenon termed the Edge of Stability
Discussion
No replies yet. Be first.
Originally published on arxiv ↗