arxiv
PublishedMay 21, 2026 at 4:00 AM
StableGrad: Backward Scale Control without Batch Normalization
Publisher summary· verbatim
arXiv:2605.19856v1 Announce Type: cross Abstract: Training very deep neural networks requires controlling the propagation of magnitudes across depth. Without such control, activations and gradients may vanish, explode, or enter unstable regimes that make optimization fail. Modern architectures often
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivSFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning1harxivMSTN: A Lightweight and Fast Model for General TimeSeries Analysis1harxivOptical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning1harxivDynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models1hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗