Model Detail
Step-3.5-Flash-Base-Midtrain
—Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling
arXiv:2604.09159v1 Announce Type: new Abstract: Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making, yet its standard Gaussian policy parameterization is inherently unimodal, limiting its ability to model complex multimodal action distrib
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
arXiv:2604.09450v1 Announce Type: cross Abstract: Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffu
Envisioning the Future, One Step at a Time
arXiv:2604.09527v1 Announce Type: cross Abstract: Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains, and efficiently explore many plausible futures. Yet most existing approaches rely on dense video o
Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection
arXiv:2604.08800v1 Announce Type: cross Abstract: Stepping-stone intrusions (SSIs) are a prevalent network evasion technique in which attackers route sessions through chains of compromised intermediate hosts to obscure their origin. Effective SSI detection requires correlating the incoming and outgo
Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym
arXiv:2604.09338v1 Announce Type: new Abstract: Spatial reasoning is central to navigation and robotics, yet measuring model capabilities on these tasks remains difficult. Existing benchmarks evaluate models in a one-shot setting, requiring full solution generation in a single response, unlike human
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
arXiv:2503.13551v5 Announce Type: replace Abstract: Recent studies show that Large Language Models (LLMs) achieve strong reasoning capabilities through supervised fine-tuning or reinforcement learning. However, a key approach, the Process Reward Model (PRM), suffers from reward hacking, making it un