Step-3.7-Flash-GGUF news

47 articles mentioning Step-3.7-Flash-GGUF

techcrunch16h ago

Mira Murati steps back into the spotlight, carefully

In the current environment, remaining heads down has diminishing returns; at some point, you have to make some noise just to remind the market you exist.

arxiv17h ago

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak

arXiv:2605.20654v2 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) demonstrate remarkable capabilities, they remain susceptible to sophisticated, multi-step jailbreak attacks that circumvent conventional surface-level safety alignment by exploiting the internal generation p

arxiv17h ago

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

arXiv:2606.05219v1 Announce Type: new Abstract: Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent

arxiv17h ago

Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments

arXiv:2606.03892v2 Announce Type: replace-cross Abstract: Training LLMs to orchestrate multi-step tool calls is held back by three coupled obstacles: realistic stateful execution environments are costly to build, synthetic training queries are often detached from the server's actual state (so the ge

arxiv17h ago

Let It Be Simple: One-Step Action Generation for Vision-Language-Action Models

arXiv:2606.05737v1 Announce Type: cross Abstract: Diffusion-based vision-language-action (VLA) models often inherit the image-generation view: actions are generated by iterative denoising. We argue that VLA action generation has a different condition-target structure: the policy is conditioned on ri

arxiv17h ago

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

arXiv:2606.05967v1 Announce Type: cross Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polya

arxiv17h ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

arXiv:2606.04246v1 Announce Type: new Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel framework that combines st

arxiv17h ago

Step-adaptive multimodal fusion network with multi-scale cloud feature learning for ultra-short-term solar irradiance forecasting

arXiv:2606.06102v1 Announce Type: cross Abstract: Ultra-short-term solar irradiance prediction is critical for photovoltaic system dispatch and power grid stability. Existing approaches suffer from three key shortcomings: single time-series models cannot capture the spatial dynamics of clouds under

arxiv17h ago

2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

arXiv:2602.21889v2 Announce Type: replace-cross Abstract: Predictions from ML models support human decision making in several fields, including high-stakes ones such as healthcare and the judiciary. Yet, we still lack a clear understanding of how decision makers learn from ML-based decision support

arxiv17h ago

Tracing the Oracle: Improving Diffusion Timestep Scheduling for 3D CT Reconstruction

arXiv:2606.06236v1 Announce Type: new Abstract: Pretrained diffusion models demonstrate impressive potential in solving highly ill-posed 3D computed tomography (CT) inverse problems, while the inference process suffers from significant computational overhead. Furthermore, existing uniform timestep s

arxiv17h ago

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

arXiv:2606.05597v1 Announce Type: new Abstract: Training vision-language web agents with multi-step RL is compute-intensive, with two dominant forms of inefficiency: idle GPUs in synchronous RL, and trajectories that use more steps and tokens than necessary. We present AsyncWebRL, which addresses bo

arxiv17h ago

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

arXiv:2605.24059v2 Announce Type: replace Abstract: We present a three-step recipe for identifying attention-head circuits in pretrained transformers. A per-head spectral signal -- the time-integrated participation ratio of each head's attention output -- ranks heads doing sustained content-dependen

arxiv1d ago

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

arXiv:2603.03205v2 Announce Type: replace Abstract: Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing files or entering credentials, can cause irreversibl

arxiv1d ago

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

arXiv:2606.04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step reasoning. To address the

arxiv1d ago

CaloTrilogy: Toward a Breakthrough in One-Step, End-to-End, Physics-Guided Shower Generation for Modern Calorimeters

arXiv:2606.04165v1 Announce Type: cross Abstract: High-precision calorimeter simulation at current and future colliders imposes rapidly growing computational demands, motivating the development of machine-learning surrogates for traditional Monte Carlo tools such as Geant4. Flow matching and diffusi

arxiv1d ago

Drifting Preference Optimization for One-Step Generative Models

arXiv:2606.02521v3 Announce Type: replace Abstract: One-step text-to-image generators are attractive for deployment because they generate an image with a single forward pass, but preference finetuning them remains difficult: standard alignment methods often rely on policy likelihoods, denoising traj

arxiv2d ago

StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems

arXiv:2606.03467v1 Announce Type: new Abstract: LLM-based multi-agent systems exhibit remarkable collaborative capabilities in complex multi-step tasks. However, these systems are highly sensitive to single-step execution errors that can propagate through agent interactions and lead to cascading fai

arxiv2d ago

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

arXiv:2604.27147v3 Announce Type: replace-cross Abstract: In generative modeling, we often wish to produce samples that maximize a user-specified reward such as aesthetic quality or alignment with human preferences, a problem known as \textit{guidance}. Despite their widespread use, existing guidanc

arxiv2d ago

HybridThinker: Efficient Chain-of-Thought Reasoning via Compressed Memory and Transient Thought Steps

arXiv:2606.03768v1 Announce Type: new Abstract: Extended chain-of-thought (CoT) traces improve LLM reasoning but incur substantial computational and memory costs. While existing CoT compression methods mitigate this by condensing thought steps into compact representations via memory tokens and retai

arxiv3d ago

When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs

arXiv:2602.03554v2 Announce Type: replace-cross Abstract: Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limited. Existing benchmarks and metrics typically rely

arxiv3d ago

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

arXiv:2604.15231v2 Announce Type: replace Abstract: Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offeri

arxiv3d ago

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

arXiv:2604.19532v3 Announce Type: replace-cross Abstract: Tokenizing music to fit the general framework of language models is a compelling challenge, especially considering the diverse symbolic structures in which music can be represented (e.g., sequences, grids, and graphs). To date, most approache

arxiv3d ago

Physics-Guided Recurrent State-Space Neural Networks for Multi-Step Prediction

arXiv:2606.02278v1 Announce Type: cross Abstract: State-space models are traditionally based on physical knowledge, but multi-step predictions from these physical models can be poor due to model inaccuracy. Black-box deep learning has shown promise as an alternative. However, these methods rely on t

arxiv3d ago

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

arXiv:2606.00017v1 Announce Type: new Abstract: Training language model agents for multi-agent strategic interaction presents a core difficulty: the quality of any action may depend on future events that never materialize, on moves that violate game rules, or on decisions made by other players. Stan

arxiv3d ago

Generate in Reconstruction Space, Match in Semantic Space: Transport Geometry for One-Step Generation

arXiv:2606.00514v1 Announce Type: new Abstract: Generative modeling and self-supervised representation learning (SSL) optimize structurally different objectives: generative training rewards distributional fidelity, while SSL rewards semantic coherence. Yet recent work repeatedly finds that SSL featu

arxiv3d ago

Implicit Drifting Policy: One-Step Action Generation via Conditional Expert Geometry

arXiv:2606.01098v1 Announce Type: cross Abstract: Generative action policies based on diffusion or flow matching excel in behavior cloning, yet their iterative sampling is prohibitive for high-frequency robot control. While recent one-step formulations alleviate this latency, they inevitably discard

arxiv3d ago

Finer Parameter Steps for Low-Rank PEFT: A Controlled Study with CP Tensor Adapters

arXiv:2606.00428v1 Announce Type: cross Abstract: Low-rank adapters are usually compared by sweeping a small set of ranks, but the rank also fixes the resolution of the parameter budget. For a $2048{\times}2048$ OPT attention projection, increasing LoRA by one rank stores $4096$ trainable scalars, l

arxiv3d ago

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

arXiv:2602.12984v2 Announce Type: replace Abstract: Scientific reasoning inherently demands integrating sophisticated toolkits to navigate domain-specific knowledge. Yet, current benchmarks largely overlook agents' ability to orchestrate tools for such rigorous workflows. To bridge this gap, we intr

arxiv3d ago

Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation

arXiv:2606.02237v1 Announce Type: new Abstract: Distribution Matching Distillation (DMD) compresses pretrained diffusion models into efficient few-step generators by aligning their noised distributions across all scales. In principle, such distribution-level supervision remains agnostic to specific

arxiv3d ago

Balancing Learning Rates Across Layers: Exact Two-Step Dynamics and Optimal Scaling in Linear Neural Networks

arXiv:2606.00340v1 Announce Type: new Abstract: We study optimal learning-rate selection in two-layer and three-layer linear neural networks trained to learn linear target functions. In particular, we derive the exact closed-form expressions for the gradients and test loss after one and two steps of

arxiv3d ago

Accelerating Min-Max Optimization via Power-Law Stepsizes

arXiv:2606.01764v1 Announce Type: cross Abstract: We revisit the convergence guarantees of the Extragradient (EG) method for unconstrained biaffine min-max optimization. It is known that EG with a fixed stepsize achieves a $\Theta(T^{-1/2})$ last-iterate convergence rate, which is slower than the op

arxiv3d ago

Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler

arXiv:2606.01827v1 Announce Type: cross Abstract: Sharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while deliverin

arxiv3d ago

SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering

arXiv:2606.00593v1 Announce Type: cross Abstract: Large language models are increasingly deployed as tool-augmented agents to acquire information beyond parametric knowledge. While recent work has improved long-horizon tool-use reasoning, most approaches focus on tasks with a single correct answer.

arxiv3d ago

Collaborative Few-Step Distillation and Low-Bit Quantization for Wan2.2 Dual-Expert Video Diffusion Models

arXiv:2606.00658v1 Announce Type: cross Abstract: Large video diffusion models achieve strong visual quality but remain expensive to deploy because each sample requires many denoising steps and a large resident parameter footprint. This paper studies a deployment-oriented compression pipeline for Wa

arxiv3d ago

An Exploratory Study into using Machine-Learning for Fast Step-by-step Emulation of Numerical Mechanical Thrombectomy Simulations for Ischemic Stroke

arXiv:2606.00892v1 Announce Type: new Abstract: The treatment of ischemic stroke using mechanical thrombectomy involves difficult decisions under intense time constraints. Numerical physics simulations can in theory inform operators to make better decisions regarding treatment approaches and device

arxiv3d ago

StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning

arXiv:2604.18401v2 Announce Type: replace Abstract: Agentic reinforcement learning (RL) is emerging as a critical post-training paradigm for improving LLM agent capabilities. Existing RL algorithms for LLMs largely follow the token-centric paradigm as in RLHF and RLVR, where tokens serve as the basi

arxiv3d ago

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

arXiv:2603.03031v2 Announce Type: replace Abstract: Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While Sparse Autoencoders (SAEs) have emerged as a power

arxiv3d ago

Preconditioned One-Step Generative Modeling for Bayesian Inverse Problems in Function Spaces

arXiv:2603.14798v2 Announce Type: replace-cross Abstract: We propose a machine-learning algorithm for Bayesian inverse problems in the function-space regime. Based on one-step generative transport, the method learns an amortized neural operator whose pushforward of a Gaussian source approximates the

arxiv3d ago

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

arXiv:2603.14465v2 Announce Type: replace Abstract: While Large Language Models (LLMs) have evolved into tool-using agents, they remain brittle in long-horizon interactions. Unlike mathematical reasoning where errors are often rectifiable via backtracking, tool-use failures frequently induce irrever

arxiv3d ago

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

arXiv:2512.02342v3 Announce Type: replace-cross Abstract: The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, in

arxiv4d ago

Randomized Feasibility Methods for Constrained Optimization with Adaptive Step Sizes

arXiv:2601.20076v2 Announce Type: replace-cross Abstract: We consider minimizing an objective function subject to constraints defined by the intersection of lower-level sets of convex functions. We study two cases: (i) strongly convex and Lipschitz-smooth objective function and (ii) convex but possi

arxiv4d ago

Revisiting Zeroth-Order Hessian Approximation: A Single-Step Policy Optimization Lens

arXiv:2605.30960v1 Announce Type: new Abstract: Accurate Zeroth-Order (ZO) Hessian estimation is a cornerstone of derivative-free methods, essential for tasks such as bilevel optimization, Bayesian inference, and uncertainty quantification. However, obtaining a complete suite of low-variance estimat

arxiv4d ago

STEP: Learning STructured Embeddings for Progressive Time Series

arXiv:2605.31061v1 Announce Type: cross Abstract: We present a novel method for learning interpretable representations of progressive time series, that is, data capturing irreversible state transitions such as degradation or task completion. Our approach uses a self-supervised contrastive objective

arxiv4d ago

VeriGate: Verifier-Gated Step-Level Supervision for GRPO

arXiv:2605.30451v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) is an effective recipe for training reasoning models with verifier-based outcome rewards, but its supervision is sparse: when all sampled trajectories for a prompt receive the same verifier reward, the group-re

arxivMay 29

How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines

arXiv:2605.28840v1 Announce Type: cross Abstract: Large language model (LLM) agents with tool-calling capabilities are increasingly deployed in production systems, yet a fundamental reliability question remains under-explored: does the same agent behave the same way twice? We present a systematic em

arxivMay 29

Faster Molecular Dynamics with Neural Network Potentials via Distilled Multiple Time-Stepping and Non-Conservative Forces

arXiv:2602.14975v3 Announce Type: replace-cross Abstract: Following our previous work (J. Phys. Chem. Lett., 2026, 17, 5, 1288-1295), we propose the DMTS-NC approach, a distilled multi-time-step (DMTS) strategy using non-conservative (NC) forces to further accelerate atomistic molecular dynamics sim

arxivMay 29

Rubric-Guided Process Reward for Stepwise Model Routing

arXiv:2605.29310v1 Announce Type: new Abstract: Stepwise model routing improves the efficiency of Large Reasoning Models (LRMs) by assigning each reasoning step to a suitable model. Recent methods formulate routing as a sequential decision process and train the router with reinforcement learning. Ho

Step-3.7-Flash-GGUF news

47 articles mentioning Step-3.7-Flash-GGUF

techcrunch16h ago

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

arxiv17h ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

arxiv17h ago

Step-adaptive multimodal fusion network with multi-scale cloud feature learning for ultra-short-term solar irradiance forecasting

arxiv17h ago

An Exploratory Study into using Machine-Learning for Fast Step-by-step Emulation of Numerical Mechanical Thrombectomy Simulations for Ischemic Stroke

arxiv3d ago

Faster Molecular Dynamics with Neural Network Potentials via Distilled Multiple Time-Stepping and Non-Conservative Forces

arxivMay 29