·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
DiScoFormer: Plug-In Density and Score Estimation with Transformers3h◆DCFO: Density-Based Counterfactuals for Outliers -- Additional Material3h◆Representation Unlearning: Forgetting through Information Compression3h◆Density-aware Sample-specific Attack3h◆Demystifying Data Organization for Enhanced LLM Training3h◆GRPO is Secretly a Process Reward Model3h◆The New Pro Se: Generative AI and the Surge in Federal Civil Self-Representation3h◆DAMEL: Dual-Axis Multi-Expert Learning for Class-Imbalanced Learning3h◆Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies3h◆Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation3h◆Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?3h◆Neural Network Verification using Partial Multi-Neuron Relaxation3h◆On Distributional Reinforcement Learning in Chaotic Dynamical Systems3h◆iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis3h◆Archon: A Unified Multimodal Model for Holistic Digital Human Generation3h◆Gram: Assessing sabotage propensities via automated alignment auditing3h◆In-Context Reward Adaptation for Robust Preference Modeling3h◆RoboWits: Unexpected Challenges for Robotic Creative Problem Solving3h◆Reasoning with Sampling: Cutting at Decision Points3h◆Unlocking the Working Memory of Large Language Models for Latent Reasoning3h◆DiScoFormer: Plug-In Density and Score Estimation with Transformers3h◆DCFO: Density-Based Counterfactuals for Outliers -- Additional Material3h◆Representation Unlearning: Forgetting through Information Compression3h◆Density-aware Sample-specific Attack3h◆Demystifying Data Organization for Enhanced LLM Training3h◆GRPO is Secretly a Process Reward Model3h◆The New Pro Se: Generative AI and the Surge in Federal Civil Self-Representation3h◆DAMEL: Dual-Axis Multi-Expert Learning for Class-Imbalanced Learning3h◆Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies3h◆Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation3h◆Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?3h◆Neural Network Verification using Partial Multi-Neuron Relaxation3h◆On Distributional Reinforcement Learning in Chaotic Dynamical Systems3h◆iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis3h◆Archon: A Unified Multimodal Model for Holistic Digital Human Generation3h◆Gram: Assessing sabotage propensities via automated alignment auditing3h◆In-Context Reward Adaptation for Robust Preference Modeling3h◆RoboWits: Unexpected Challenges for Robotic Creative Problem Solving3h◆Reasoning with Sampling: Cutting at Decision Points3h◆Unlocking the Working Memory of Large Language Models for Latent Reasoning3h◆
News/UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
arxiv
PublishedMay 29, 2026 at 4:00 AM

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2604.18518v4 Announce Type: replace-cross Abstract: Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UD

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivDiScoFormer: Plug-In Density and Score Estimation with Transformers3harxivDCFO: Density-Based Counterfactuals for Outliers -- Additional Material3harxivRepresentation Unlearning: Forgetting through Information Compression3harxivDensity-aware Sample-specific Attack3h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews