·

Home
Models
News
Compare
Boards
Pricing
About
Newsletter
Methodology
Contact

Latest

GAGPO: Generalized Advantage Grouped Policy Optimization3h◆UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems3h◆Orchestra-o1: Omnimodal Agent Orchestration3h◆Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher3h◆WorkBench Revisited: Workplace Agents Two Years On3h◆Refusal Beyond a Single Direction: A Preliminary Comparison of Diff-in-Means and INLP3h◆YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications3h◆MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis3h◆A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale3h◆Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL3h◆SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing3h◆CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning3h◆Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows3h◆An Agentic Retrieval Framework for Autonomous Context-Aware Data Quality Assessment3h◆Aligning Quantum Operators with Large Language Models3h◆Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response3h◆When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation3h◆Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning3h◆Clay-CNN Hybrids: Leveraging Geo-Foundational Models as Auxiliary Context for Landslide Detection3h◆FEMOT: Multi-Object Tracking using Frame and Event Cameras3h◆GAGPO: Generalized Advantage Grouped Policy Optimization3h◆UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems3h◆Orchestra-o1: Omnimodal Agent Orchestration3h◆Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher3h◆WorkBench Revisited: Workplace Agents Two Years On3h◆Refusal Beyond a Single Direction: A Preliminary Comparison of Diff-in-Means and INLP3h◆YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications3h◆MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis3h◆A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale3h◆Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL3h◆SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing3h◆CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning3h◆Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows3h◆An Agentic Retrieval Framework for Autonomous Context-Aware Data Quality Assessment3h◆Aligning Quantum Operators with Large Language Models3h◆Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response3h◆When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation3h◆Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning3h◆Clay-CNN Hybrids: Leveraging Geo-Foundational Models as Auxiliary Context for Landslide Detection3h◆FEMOT: Multi-Object Tracking using Frame and Event Cameras3h◆

News/Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

huggingface

PublishedJanuary 31, 2025 at 10:29 AM

Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

Source

huggingface.cofull article ↗

Read on huggingface→

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Email address

// no spam · unsubscribe one-click · free forever

Discussion

Source

↗

huggingface

Read original ↗All from huggingface →

No replies yet. Be first.

Source

↗

huggingface

Read original ↗All from huggingface →

The Bubble Brief

WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

Email address

// no spam · unsubscribe one-click · free forever

Originally published on huggingface ↗

Home Models News