·

Home
Models
News
Compare
Boards
Pricing
About
Newsletter
Methodology
Contact

Latest

Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale2h◆LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling3h◆Mental-R1: Aligning LLM Reasoning for Mental Health Assessment3h◆Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata3h◆TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models3h◆Beyond Problem Solving: UOJ-Bench for Evaluating Code Generation, Hacking, and Repair in Competitive Programming3h◆Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning3h◆MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems3h◆LoRA-Muon: Spectral Steepest Descent on the Low-Rank Manifold3h◆Order Is Not Control3h◆An Embodied Simulation Platform, Benchmark, and Data-Efficient Augmentation Framework for Wet-Lab Robotics3h◆TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment3h◆EA-WM: Event-Aware World Models with Task-Specification Grounding for Long-Horizon Manipulation3h◆TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization3h◆"Is This Not Enough?": Asymmetries in Institutional Accountability and Collective Sensemaking in the Case of Canada's Algorithmic Visa Triage System3h◆Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents3h◆G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents3h◆Select and Improve: Understanding the Mechanics of Post-Training for Reasoning3h◆MiniPIC: Flexible Position-Independent Caching in <100LOC3h◆Towards Personalized Federated Learning for Dysarthric Speech Recognition3h◆Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale2h◆LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling3h◆Mental-R1: Aligning LLM Reasoning for Mental Health Assessment3h◆Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata3h◆TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models3h◆Beyond Problem Solving: UOJ-Bench for Evaluating Code Generation, Hacking, and Repair in Competitive Programming3h◆Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning3h◆MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems3h◆LoRA-Muon: Spectral Steepest Descent on the Low-Rank Manifold3h◆Order Is Not Control3h◆An Embodied Simulation Platform, Benchmark, and Data-Efficient Augmentation Framework for Wet-Lab Robotics3h◆TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment3h◆EA-WM: Event-Aware World Models with Task-Specification Grounding for Long-Horizon Manipulation3h◆TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization3h◆"Is This Not Enough?": Asymmetries in Institutional Accountability and Collective Sensemaking in the Case of Canada's Algorithmic Visa Triage System3h◆Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents3h◆G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents3h◆Select and Improve: Understanding the Mechanics of Post-Training for Reasoning3h◆MiniPIC: Flexible Position-Independent Caching in <100LOC3h◆Towards Personalized Federated Learning for Dysarthric Speech Recognition3h◆

News/The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

huggingface

PublishedDecember 17, 2025 at 1:22 PM

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

Source

huggingface.cofull article ↗

Read on huggingface→

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Email address

// no spam · unsubscribe one-click · free forever

Discussion

Source

↗

huggingface

Read original ↗All from huggingface →

No replies yet. Be first.

Source

↗

huggingface

Read original ↗All from huggingface →

The Bubble Brief

WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

Email address

// no spam · unsubscribe one-click · free forever

Originally published on huggingface ↗

Home Models News