·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Discovering a Zeta Map Algorithm on Dyck Paths via Mechanistic Interpretability15m◆Improving Relative Representations with Learned Anchors and Whitened Inner Products15m◆ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings15m◆The Fast Mixing Mechanism for Differential Privacy15m◆TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness15m◆Constrained Flow Optimization via Sequential Fine Tuning for Molecular Design15m◆Improving Selective Classification with Pairwise Queries for Binary Classification15m◆CellBRIDGE: Learning Cellular Trajectories via Interaction-Aware Alignment15m◆Diffusion Models Preferentially Memorize Prototypical Examples or: Why Does My Diffusion Model Love Slop?15m◆Convergence of Steepest Descent and Adam under Non-Uniform Smoothness15m◆Bridging the Gap Between Natural Language and Market Dynamics via High-Dimensional Representation Learning15m◆Learning to Perceive the World Through Control: Empowerment-Based Representation Learning15m◆BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies15m◆Spatio-temporal stochastic graph-based learning for infectious disease forecasting15m◆Universal Decision Learners15m◆A Context-Aware Middleware for Medical Image Based Reports: An approach based on image feature extraction and association rules15m◆Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models15m◆Self-Certifying Transport MCMC via Dual Spectral-Gap Certificates15m◆Reducing the GPU Memory Bottleneck with Lossless Compression for ML -- Extended15m◆SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching15m◆Discovering a Zeta Map Algorithm on Dyck Paths via Mechanistic Interpretability15m◆Improving Relative Representations with Learned Anchors and Whitened Inner Products15m◆ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings15m◆The Fast Mixing Mechanism for Differential Privacy15m◆TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness15m◆Constrained Flow Optimization via Sequential Fine Tuning for Molecular Design15m◆Improving Selective Classification with Pairwise Queries for Binary Classification15m◆CellBRIDGE: Learning Cellular Trajectories via Interaction-Aware Alignment15m◆Diffusion Models Preferentially Memorize Prototypical Examples or: Why Does My Diffusion Model Love Slop?15m◆Convergence of Steepest Descent and Adam under Non-Uniform Smoothness15m◆Bridging the Gap Between Natural Language and Market Dynamics via High-Dimensional Representation Learning15m◆Learning to Perceive the World Through Control: Empowerment-Based Representation Learning15m◆BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies15m◆Spatio-temporal stochastic graph-based learning for infectious disease forecasting15m◆Universal Decision Learners15m◆A Context-Aware Middleware for Medical Image Based Reports: An approach based on image feature extraction and association rules15m◆Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models15m◆Self-Certifying Transport MCMC via Dual Spectral-Gap Certificates15m◆Reducing the GPU Memory Bottleneck with Lossless Compression for ML -- Extended15m◆SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching15m◆
News/GradientStabilizer:Fix the Norm, Not the Gradient
arxiv
PublishedMay 28, 2026 at 4:00 AM
—neutral

GradientStabilizer:Fix the Norm, Not the Gradient

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2502.17055v4 Announce Type: replace-cross Abstract: Training instability in modern deep learning systems is frequently triggered by rare but extreme gradient-norm spikes, which can induce oversized parameter updates, corrupt optimizer state, and lead to slow recovery or divergence. Widely used

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivDiscovering a Zeta Map Algorithm on Dyck Paths via Mechanistic Interpretability15marxivImproving Relative Representations with Learned Anchors and Whitened Inner Products15marxivScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings15marxivThe Fast Mixing Mechanism for Differential Privacy15m
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews