·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
SpaceX officially prices shares at $135 in the largest IPO ever5h◆Our new community investments in Virginia support local jobs and expand energy affordability.5h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift5h◆Amazon’s data centers used 2.5 billion gallons of water last year8h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others9h◆Pool’s new app turns your screenshots into something useful10h◆DoorDash’s new AI chatbot lets you order with prompts and photos11h◆Anthropic apologizes for invisible Claude Fable guardrails14h◆Google DeepMind is worried about what happens when millions of agents start to interact14h◆Deezer launches an AI music detector for other streaming services17h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing21h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning21h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!21h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation21h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions21h◆The Impossibility of Eliciting Latent Knowledge21h◆Mapping Scientific Literature with Large Language Models and Topic Modeling21h◆Grounding Computer Use Agents on Human Demonstrations21h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models21h◆LSTM based IoT Device Identification21h◆SpaceX officially prices shares at $135 in the largest IPO ever5h◆Our new community investments in Virginia support local jobs and expand energy affordability.5h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift5h◆Amazon’s data centers used 2.5 billion gallons of water last year8h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others9h◆Pool’s new app turns your screenshots into something useful10h◆DoorDash’s new AI chatbot lets you order with prompts and photos11h◆Anthropic apologizes for invisible Claude Fable guardrails14h◆Google DeepMind is worried about what happens when millions of agents start to interact14h◆Deezer launches an AI music detector for other streaming services17h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing21h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning21h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!21h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation21h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions21h◆The Impossibility of Eliciting Latent Knowledge21h◆Mapping Scientific Literature with Large Language Models and Topic Modeling21h◆Grounding Computer Use Agents on Human Demonstrations21h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models21h◆LSTM based IoT Device Identification21h◆
Tag

#optimization

81 articles tagged #optimization

arxiv21h ago

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

arXiv:2606.11284v1 Announce Type: cross Abstract: Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an e

PH1 model#multi-agent#reinforcement-learning#game-theoryRead on arxiv →
arxiv21h agobullish

A Physics-Inspired Optimizer: Velocity Regularized Adam

arXiv:2505.13196v3 Announce Type: replace-cross Abstract: We introduce Velocity-Regularized Adam (VRAdam), a physics-inspired optimizer for training deep neural networks that draws on ideas from quartic terms for kinetic energy with its stabilizing effects on various system dynamics. Previous algori

VEADAD3 models#optimization#deep-learning#machine-learningRead on arxiv →
arxiv1d agobullish

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

arXiv:2606.11119v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, aris

QW1 model#reinforcement-learning#language-models#optimizationRead on arxiv →
arxiv1d agobullish

HMAF: A Hierarchical Multi-Slot GD-RTB Allocation Framework

arXiv:2606.09896v1 Announce Type: cross Abstract: In modern online advertising platforms, Guaranteed Delivery (GD) contracts coexist and bid with Real-Time Bidding (RTB) auctions. Recent approaches either decouple GD and RTB optimization or rely on heuristic priority rules, and thus fail to effectiv

#advertising#optimization#revenueRead on arxiv →
arxiv1d agobullish

Sim2Schedule: A Simulator-Guided LLM Framework for Autonomous Open-Pit Mine Scheduling

arXiv:2606.10286v1 Announce Type: new Abstract: Open-pit mine scheduling is a critical process for maximizing economic return under complex geotechnical and operational constraints. While Mixed-Integer Linear Programming (MILP) provides mathematically optimal baselines, its exponential computational

LA1 model#optimization#scheduling#industrial-applicationsRead on arxiv →
arxiv1d agobullish

Operator Fusion for LLM Inference on the Tensix Architecture

arXiv:2606.09879v1 Announce Type: new Abstract: This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in self-attention and in t

TRQWQW4 models · +1#machine learning#optimization#parallelismRead on arxiv →
arxiv5d agobullish

Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models

arXiv:2606.05688v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models scale foundation models efficiently by activating only a subset of experts for each token, but their large number of expert parameters still makes quantization essential for practical deployment. Unlike dense models, h

MI1 model#quantization#moe#foundation-modelsRead on arxiv →
arxivJun 3

Decentralized Stochastic Nonconvex Optimization under the $(L_0,L_1)$-Smoothness

arXiv:2509.08726v3 Announce Type: replace-cross Abstract: This paper focuses on the decentralized stochastic optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$ over a connected network of $n$ agents, where each local function has the form of $f_i(\mathbf{x}) = {\mathbb E}\l

#optimization#stochastic#nonconvexRead on arxiv →
arxivJun 3bullish

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

arXiv:2606.02684v1 Announce Type: cross Abstract: On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most info

FI1 model#on-policy#distillation#optimizationRead on arxiv →
arxivJun 3bullish

Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning

arXiv:2606.03113v1 Announce Type: new Abstract: Large Language Models suffer from slow autoregressive inference. While self-speculative decoding accelerates this process, its efficiency is hampered by static configurations like fixed exit layers and speculation lengths. We reframe this optimization

MEME2 models#optimization#reinforcement-learning#language-modelsRead on arxiv →
arxivJun 3bullish

Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals

arXiv:2606.02679v1 Announce Type: new Abstract: Multimodal systems often benefit from combining information across language, sound, and visual streams, but this benefit is not guaranteed. A modality that is useful for one input may become distracting for another, and local feature responses within t

#multimodal#fusion#calibrationRead on arxiv →
arxivJun 2

How Much Orthogonalization Does Muon Need?

arXiv:2606.00371v1 Announce Type: new Abstract: Muon optimizers improve neural-network training by replacing ill-conditioned momentum updates with approximately semi-orthogonal updates. This motivates a practical question: how much orthogonalization does Muon actually require? We study this question

NAGPMA4 models · +1#machine-learning#optimization#neural-networksRead on arxiv →
arxivJun 2bullish

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

arXiv:2605.12969v3 Announce Type: replace-cross Abstract: Group Relative Policy Optimization (GRPO) is one of the most widely adopted RLVR algorithms for post-training large language models on reasoning tasks. We first show that GRPO admits an equivalent discriminative reformulation, in which policy

GRCO2 models#reinforcement-learning#language-models#optimizationRead on arxiv →
arxivJun 2bullish

Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization

arXiv:2510.05342v2 Announce Type: replace-cross Abstract: Direct Preference Optimization (DPO) has emerged as a simple and effective method for aligning large language models. However, its reliance on a fixed temperature parameter leads to suboptimal training on diverse preference data, causing over

DIIP$\4 models · +1#machine-learning#optimization#language-modelsRead on arxiv →
arxivJun 2bullish

Efficient Test-time Inference for Generative Planning Models

arXiv:2606.00618v1 Announce Type: new Abstract: Generative models have emerged as a powerful paradigm for AI planning, yet their performance remains constrained by the training data distribution. One approach is to improve generated solutions during inference by scaling test-time compute. A more eff

GEHE2 models#planning#inference#optimizationRead on arxiv →
arxivMay 29

Gesture-Aware Indoor THz ISAC Systems for Adaptive Resource Allocation

arXiv:2605.29913v1 Announce Type: cross Abstract: This paper investigates a multi-user indoor integrated sensing and communication (ISAC) system operating in the terahertz (THz) band, designed for adaptive communication based on gesture recognition. Leveraging gesture tracking through an extended Ka

EX1 model#terahertz#gesture-recognition#optimizationRead on arxiv →
arxivMay 29

Calibrating Generative Models to Distributional Constraints

arXiv:2510.10020v4 Announce Type: replace-cross Abstract: Generative models frequently suffer miscalibration, wherein statistics of the sampling distribution, such as the fraction of generations in a given class, deviate from desired values. We frame calibration as a constrained optimization problem

#machine-learning#calibration#optimizationRead on arxiv →
arxivMay 29bullish

HARP: Hadamard-Preconditioned Adaptive Rotation Processor for Extreme LLM Quantization

arXiv:2605.29843v1 Announce Type: cross Abstract: Post-training quantization (PTQ) is essential for deploying LLMs under memory and bandwidth constraints. However, extreme low-bit quantization remains highly sensitive to activation outliers and anisotropic weight curvature. Existing incoherence-base

LL1 model#quantization#machine learning#optimizationRead on arxiv →
arxivMay 28

Worker Disagreement Reveals Sharp Directions in Local SGD

arXiv:2605.27739v1 Announce Type: cross Abstract: Deep neural network training often exhibits highly anisotropic loss geometry, where a few sharp dominant Hessian directions coexist with a large flatter bulk. Gradients tend to align disproportionately with these dominant directions, although stable

MLCNTR3 models#machine-learning#deep-learning#optimizationRead on arxiv →
arxivMay 26bullish

Local MAP Sampling for Diffusion Models

arXiv:2510.07343v3 Announce Type: replace-cross Abstract: Diffusion Posterior Sampling (DPS) provides a principled Bayesian approach to inverse problems by sampling from $p(x_0 \mid y)$. While posterior sampling is valuable for capturing uncertainty and multi-modality, many classical and practical i

#image-restoration#scientific-applications#bayesian-inferenceRead on arxiv →
arxivMay 26

Active Query Synthesis for Preference Learning

arXiv:2605.26072v1 Announce Type: new Abstract: Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Fu

#active-learning#machine-learning#optimizationRead on arxiv →
arxivMay 25bullish

AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning

arXiv:2601.17261v4 Announce Type: replace Abstract: Zeroth-Order (ZO) optimization has emerged as a promising solution for fine-tuning LLMs under strict memory constraints, as it avoids the prohibitive memory cost of storing activations for backpropagation. However, existing ZO methods typically emp

QWPA2 models#optimization#llms#fine-tuningRead on arxiv →
arxivMay 25bullish

Transform-Invariant Generative Ray Path Sampling for Efficient Radio Propagation Modeling

arXiv:2603.01655v2 Announce Type: replace Abstract: Ray tracing has become a standard for accurate radio propagation modeling, but suffers from exponential computational complexity, as the number of candidate paths scales with the number of objects raised to the interaction order. This bottleneck li

GE1 model#machine-learning#signal-processing#optimizationRead on arxiv →
arxivMay 22bullish

Token-weighted Direct Preference Optimization with Attention

arXiv:2605.21883v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existi

LA1 model#optimization#language-models#reinforcement-learningRead on arxiv →
arxivMay 22bullish

Retrospective Sparse Attention for Efficient Long-Context Generation

arXiv:2508.09001v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in long-context tasks such as reasoning, code generation, and multi-turn dialogue. However, inference over extended contexts is bottlenecked by the Key-Value (KV) cache, whose memory foot

#large-language-models#optimization#attention-mechanismsRead on arxiv →
arxivMay 21bullish

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

arXiv:2605.19781v1 Announce Type: new Abstract: Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update

MUSGAD5 models · +2#optimization#deep learning#neural networksRead on arxiv →
arxivMay 21bullish

Fast and Featureless Node Representation Learning with Partial Pairwise Supervision

arXiv:2605.19916v1 Announce Type: cross Abstract: We introduce Contrastive FUSE, a fast and unified framework for scalable node representation learning in graphs with partially available pairwise node labels and no available node features. Unlike existing methods, we directly optimize a spectral con

CO1 model#machine-learning#graph-learning#optimizationRead on arxiv →
arxivMay 19bullish

Latent Heuristic Search: Continuous Optimization for Automated Algorithm Design

arXiv:2605.17137v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) into evolutionary frameworks has established a new paradigm for automated heuristic discovery. Despite their promise, these methods typically search in the discrete space of program syntax, relying on sto

LA1 model#optimization#automated-heuristic-discovery#evolutionary-frameworksRead on arxiv →
arxivMay 19

Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems

arXiv:2605.13900v2 Announce Type: replace-cross Abstract: In large-scale multi-agent systems with shared resource constraints, an upstream planner must iteratively evaluate candidate resource plans -- assessing feasibility, aggregate response, and marginal cost -- before committing to one. Lagrangia

#multi-agent#machine-learning#supply-chainRead on arxiv →
arxivMay 18

Preconditioned Regularized Wasserstein Proximal Sampling

arXiv:2509.01685v2 Announce Type: replace-cross Abstract: We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tract

TR1 model#machine-learning#optimization#sampling-methodsRead on arxiv →
arxivMay 18

Rethinking Neural Network Learning Rates: A Stackelberg Perspective

arXiv:2605.15530v1 Announce Type: new Abstract: Neural networks are typically trained with a single learning rate across all layers. While recent empirical evidence suggests that assigning layer-specific learning rates can accelerate training, a principled understanding of the conditions and mechani

#machine-learning#optimization#neural-networksRead on arxiv →
arxivMay 16bullish

Beyond What to Select: A Plug-and-play Oscillatory Data-Volume Scheduling for Efficient Model Training

arXiv:2605.14773v1 Announce Type: cross Abstract: Data selection accelerates training by identifying representative training data while preserving model performance. However, existing methods mainly focus on designing sample-importance criteria, i.e., deciding what to select, while typically fixing

#optimization#machine-learning#efficiencyRead on arxiv →
arxivMay 16

Numerical exploration of the range of shape functionals using neural networks

arXiv:2602.14881v2 Announce Type: replace-cross Abstract: We introduce a novel numerical framework for the exploration of Blaschke--Santal\'o diagrams, which are efficient tools characterizing the possible inequalities relating some given shape functionals. We introduce a parametrization of convex b

IN1 model#optimization#neural-networks#geometryRead on arxiv →
arxivMay 16

Adapting Dijkstra for Buffers and Unlimited Transfers

arXiv:2603.11729v3 Announce Type: replace-cross Abstract: In recent years, RAPTOR based algorithms have been considered the state-of-the-art for path-finding with unlimited transfers without preprocessing. However, this status largely stems from the evolution of routing research, where Dijkstra-base

#routing#algorithms#optimizationRead on arxiv →
arxivMay 15

To discretize continually: Mean shift interacting particle systems for Bayesian inference

arXiv:2605.14142v1 Announce Type: cross Abstract: Integration against a probability distribution given its unnormalized density is a central task in Bayesian inference and other fields. We introduce new methods for approximating such expectations with a small set of weighted samples -- i.e., a quadr

#machine-learning#bayesian-inference#samplingRead on arxiv →
arxivMay 15

Generative Bayesian Optimization: Generative Models as Acquisition Functions

arXiv:2510.25240v3 Announce Type: replace-cross Abstract: We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-con

#optimization#machine-learning#researchRead on arxiv →
arxivMay 14bullish

ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection

arXiv:2605.12879v1 Announce Type: new Abstract: Doubly-stochastic attention has emerged as a transport-based alternative to row-softmax attention, with recent Transformer variants using it to reduce attention sinks and rank collapse while improving performance. In this family, the standard approach

SIAS2 models#transformer#attention#machine-learningRead on arxiv →
arxivMay 14bullish

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

arXiv:2605.13784v1 Announce Type: new Abstract: Conventional transformer inference engines are request-driven, paying an O(n) prefill cost on every query. In streaming workloads, where data arrives continuously and queries probe an ever-growing context, this cost is prohibitive. We introduce a data-

VLSGTE3 models#streaming#inference#optimizationRead on arxiv →
arxivMay 13

Constructive conditional normalizing flows

arXiv:2602.08606v3 Announce Type: replace-cross Abstract: Motivated by applications in conditional sampling, given a probability measure $\mu$ and a diffeomorphism $\phi$, we consider the problem of simultaneously approximating $\phi$ and the pushforward $\phi_{\#}\mu$ by means of the flow of a cont

PE1 model#optimization#machine learning#probabilityRead on arxiv →
arxivMay 11

A Rod Flow Model for Adam at the Edge of Stability

arXiv:2605.06821v1 Announce Type: cross Abstract: Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these mod

ADRMNA5 models · +2#optimization#machine learning#momentum methodsRead on arxiv →
arxivMay 11bullish

CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion for Distributed LLM Training

arXiv:2604.24013v2 Announce Type: cross Abstract: The rapid growth in the size of large language models has necessitated the partitioning of computational workloads across accelerators such as GPUs, TPUs, and NPUs. However, these parallelization strategies incur substantial data communication overhe

#distributed-training#parallelization#optimizationRead on arxiv →
arxivMay 8

Dynamic Controlled Variables Based Dynamic Self-Optimizing Control

arXiv:2605.06469v1 Announce Type: cross Abstract: Self-optimizing control is a strategy for selecting controlled variables, where the economic objective guides the selection and design of controlled variables, with the expectation that maintaining the controlled variables at constant values can achi

DE1 model#optimization#control#machine-learningRead on arxiv →
arxivMay 7bullish

Adaptive Ensemble Aggregation for Actor-Critics

arXiv:2507.23501v2 Announce Type: replace Abstract: Ensembles are ubiquitous in off-policy actor-critic learning, yet their efficacy depends critically on how they are aggregated. Current methods typically rely on static rules or task-specific hyperparameters to balance overestimation bias and varia

#reinforcement-learning#ensemble-methods#machine-learningRead on arxiv →
arxivMay 5bullish

FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control

arXiv:2604.19021v2 Announce Type: replace Abstract: Linear attention mechanisms have emerged as promising alternatives to softmax attention, offering linear-time complexity during inference. Recent advances such as Gated DeltaNet (GDN) and Kimi Delta Attention (KDA) have demonstrated that the delta

GAKIFG4 models · +1#machine learning#attention mechanisms#optimizationRead on arxiv →
arxivMay 4

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

arXiv:2604.17423v2 Announce Type: replace Abstract: A unified framework for first-order optimization algorithms fornonconvex unconstrained optimization is proposed that uses adaptivelypreconditioned gradients and includes popular methods such as full anddiagonal AdaGrad, AdaNorm, as well as adpative

ADADSH4 models · +1#optimization#machine-learning#researchRead on arxiv →
arxivMay 4bullish

Bridging Graph Drawing and Dimensionality Reduction with Stochastic Stress Optimization

arXiv:2605.00641v1 Announce Type: new Abstract: Both Dimensionality Reduction (DR) and Graph Drawing (GD) aim to visualize abstract, non-linear structures, yet rely on different optimization paradigms. This contrast is evident in Multidimensional Scaling (MDS), which typically depends on the SMACOF

#optimization#dimensionality-reduction#machine-learningRead on arxiv →
arxivMay 1bullish

Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting

arXiv:2602.05371v3 Announce Type: replace Abstract: Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present

HI1 model#machine-learning#decision-trees#optimizationRead on arxiv →
arxivApr 30bullish

Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning

arXiv:2508.19900v2 Announce Type: replace Abstract: Offline reinforcement learning (RL) enables learning effective policies from fixed datasets without any environment interaction. Existing methods typically employ policy constraints to mitigate the distribution shift encountered during offline RL t

#offline-rl#reinforcement-learning#machine-learningRead on arxiv →
arxivApr 30bullish

Generative Bid Shading in Real-Time Bidding Advertising

arXiv:2508.06550v3 Announce Type: replace-cross Abstract: Bid shading plays a crucial role in Real-Time Bidding (RTB) by adaptively adjusting the bid to avoid advertisers overspending. Existing mainstream two-stage methods, which first model bid landscapes and then optimize surplus using operations

GEAUCH3 models#real-time-bidding#advertising#machine-learningRead on arxiv →
arxivApr 30bullish

Test-Time Safety Alignment

arXiv:2604.26167v1 Announce Type: cross Abstract: Recent work has shown that a model's input word embeddings can serve as effective control variables for steering its behavior toward outputs that satisfy desired properties. However, this has only been demonstrated for pretrained text-completion mode

#safety#language-models#optimizationRead on arxiv →
arxivApr 29bullish

MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches

arXiv:2604.22881v1 Announce Type: cross Abstract: Generative recommendation (GR) offers superior modeling capabilities but suffers from prohibitive inference costs due to the repeated encoding of long user histories. While cross-request Key-Value (KV) cache reuse presents a significant optimization

#optimization#machine-learning#cache-managementRead on arxiv →
arxivApr 29bullish

Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter

arXiv:2510.23215v2 Announce Type: replace-cross Abstract: Eigenvalue problems are among the most important topics in many scientific disciplines. With the recent surge and development of machine learning, neural eigenvalue methods have attracted significant attention as a forward pass of inference r

#machine-learning#eigenvalue-problems#numerical-analysisRead on arxiv →
arxivApr 29bullish

Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement

arXiv:2604.14989v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have sparked growing interest in automatic RTL optimization for better performance, power, and area (PPA). However, existing methods are still far from realistic RTL optimization. Their evaluation set

DR1 model#optimization#eda#rtlRead on arxiv →
arxivApr 27bullish

Robust Fuzzy local k-plane clustering with mixture distance of hinge loss and L1 norm

arXiv:2604.22405v1 Announce Type: new Abstract: K-plane clustering (KPC), hyperplane clustering, and mixture regression all essentially fall within the same class of problems. This problem can be conceptualized as clustering in relatively high-dimensional K subspaces or K linear manifolds. Tradition

RF1 model#clustering#machine-learning#robustnessRead on arxiv →
arxivApr 27bullish

A general optimization solver based on OP-to-MaxSAT reduction

arXiv:2604.21961v1 Announce Type: cross Abstract: Optimization problems are fundamental in diverse fields, such as engineering, economics, and scientific computing. However, current algorithms are mostly designed for specific problem types and exhibit limited generality in solving multiple types of

#optimization#algorithm#researchRead on arxiv →
arxivApr 23bullish

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

arXiv:2308.03303v2 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) is crucial for improving their performance on downstream tasks, but full-parameter fine-tuning (Full-FT) is computationally expensive and memory-intensive. Parameter-efficient fine-tuning (PEFT) methods, suc

LOLO2 models#fine-tuning#language-models#optimizationRead on arxiv →
arxivApr 23

LayerTracer: A Joint Task-Particle and Vulnerable-Layer Analysis framework for Arbitrary Large Language Model Architectures

arXiv:2604.20556v1 Announce Type: cross Abstract: Currently, Large Language Models (LLMs) feature a diversified architectural landscape, including traditional Transformer, GateDeltaNet, and Mamba. However, the evolutionary laws of hierarchical representations, task knowledge formation positions, and

TRGAMA3 models#large-language-models#architecture#interpretabilityRead on arxiv →
arxivApr 18bullish

Calibrate-Then-Delegate: Safety Monitoring with Risk and Budget Guarantees via Model Cascades

arXiv:2604.14251v1 Announce Type: new Abstract: Monitoring LLM safety at scale requires balancing cost and accuracy: a cheap latent-space probe can screen every input, but hard cases should be escalated to a more expensive expert. Existing cascades delegate based on probe uncertainty, but uncertaint

CA1 model#safety#machine-learning#optimizationRead on arxiv →
arxivApr 18

Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?

arXiv:2509.12833v2 Announce Type: replace Abstract: Projection-based safety filters, which modify unsafe actions by mapping them to the closest safe alternative, are widely used to enforce safety constraints in reinforcement learning (RL). Two integration strategies are commonly considered: Safe env

#reinforcement-learning#safety#optimizationRead on arxiv →
arxivApr 17bullish

Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

arXiv:2603.13683v2 Announce Type: replace Abstract: Although debiased large language models (LLMs) excel at handling known or low-bias prompts, they often fail on unfamiliar and high-bias prompts. We demonstrate via out-of-distribution (OOD) detection that these high-bias prompts cause a distributio

#debiasing#optimization#language-modelsRead on arxiv →
arxivApr 16bullish

Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem

arXiv:2507.09503v4 Announce Type: replace-cross Abstract: This paper proposes a neural stochastic optimization method for efficiently solving the two-stage stochastic unit commitment (2S-SUC) problem under high-dimensional uncertainty scenarios. The proposed method approximates the second-stage reco

NE1 model#optimization#machine-learning#scalabilityRead on arxiv →
arxivApr 15

On the Convergence Analysis of Muon

arXiv:2505.23737v2 Announce Type: replace-cross Abstract: The majority of parameters in neural networks are naturally represented as matrices. However, most commonly used optimizers treat these matrix parameters as flattened vectors during optimization, potentially overlooking their inherent structu

MUGR2 models#optimization#neural-networks#machine-learningRead on arxiv →
arxivApr 14bullish

Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search

arXiv:2604.11109v1 Announce Type: cross Abstract: As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge. Developers often spend months tuning scientific applications

RELL2 models#optimization#gpu#performanceRead on arxiv →
arxivApr 13bullish

CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference

arXiv:2604.08584v1 Announce Type: cross Abstract: Long-context LLMs increasingly rely on extended, reusable prefill prompts for agents and domain Q&A, pushing attention and KV-cache to become the dominant decode-time bottlenecks. While sparse attention reduces computation and transfer costs, it ofte

#sparse-attention#long-context#machine-learningRead on arxiv →
arxivApr 10bullish

LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis

arXiv:2510.24561v2 Announce Type: replace-cross Abstract: LoRA has become a widely adopted method for PEFT, and its initialization methods have attracted increasing attention. However, existing methods have notable limitations: many methods do not incorporate target-domain data, while gradient-based

LOLO2 models#machine-learning#optimization#initializationRead on arxiv →
arxivApr 10bullish

We Still Don't Understand High-Dimensional Bayesian Optimization

arXiv:2512.00170v2 Announce Type: replace Abstract: Existing high-dimensional Bayesian optimization (BO) methods aim to overcome the curse of dimensionality by carefully encoding structural assumptions, from locality to sparsity to smoothness, into the optimization procedure. Surprisingly, we demons

BAGA2 models#optimization#machine-learning#bayesian-methodsRead on arxiv →
arxivApr 10bullish

FVD: Inference-Time Alignment of Diffusion Models via Fleming-Viot Resampling

arXiv:2604.06779v1 Announce Type: new Abstract: We introduce Fleming-Viot Diffusion (FVD), an inference-time alignment method that resolves the diversity collapse commonly observed in Sequential Monte Carlo (SMC) based diffusion samplers. Existing SMC-based diffusion samplers often rely on multinomi

#diffusion#monte-carlo#inferenceRead on arxiv →
arxivApr 10bullish

SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization

arXiv:2604.07663v1 Announce Type: new Abstract: The AdamW optimizer, while standard for LLM pretraining, is a critical memory bottleneck, consuming optimizer states equivalent to twice the model's size. Although light-state optimizers like SinkGD attempt to address this issue, we identify the embedd

ME1 model#optimization#memory-efficiency#large-language-modelsRead on arxiv →
arxivApr 9bullish

STQuant: Spatio-Temporal Adaptive Framework for Optimizer Quantization in Large Multimodal Model Training

arXiv:2604.06836v1 Announce Type: new Abstract: Quantization is an effective way to reduce the memory cost of large-scale model training. However, most existing methods adopt fixed-precision policies, which ignore the fact that optimizer-state distributions vary significantly across layers and train

GPVI2 models#optimization#quantization#memory-reductionRead on arxiv →
arxivApr 8bullish

StateX: Enhancing RNN Recall via Post-training State Expansion

arXiv:2509.22630v2 Announce Type: replace-cross Abstract: Recurrent neural networks (RNNs), such as linear attention and state-space models, have gained popularity due to their constant per-token complexity when processing long contexts. However, these recurrent models struggle with tasks that requi

#rnn#state-space#post-trainingRead on arxiv →
arxivApr 8bullish

HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference

arXiv:2604.05887v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have advanced unified reasoning over text, images, and videos, but their inference is hindered by the rapid growth of key-value (KV) caches. Each visual input expands into thousands of tokens, causing caches to

QW1 model#multimodal#compression#optimizationRead on arxiv →
arxivApr 7bullish

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

arXiv:2603.13842v2 Announce Type: replace-cross Abstract: End-to-end autonomous driving is typically built upon imitation learning (IL), yet its performance is constrained by the quality of human demonstrations. To overcome this limitation, recent methods incorporate reinforcement learning (RL) thro

PATRDI3 models#autonomous driving#imitation learning#reinforcement learningRead on arxiv →
arxivApr 7bullish

TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization

arXiv:2601.22776v2 Announce Type: replace Abstract: Multi-turn tool-integrated reasoning enables Large Language Models (LLMs) to solve complex tasks through iterative information retrieval. However, current reinforcement learning (RL) frameworks for search-augmented reasoning predominantly rely on s

QWQW2 models#reinforcement learning#large language models#reasoningRead on arxiv →
arxivApr 6bullish

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration

arXiv:2604.02869v1 Announce Type: new Abstract: Training tool-calling agents with reinforcement learning on multi-turn tasks remains challenging due to sparse outcome rewards and difficult credit assignment across conversation turns. We present the first application of MT-GRPO (Multi-Turn Group Rela

QWQWGP5 models · +2#reinforcement-learning#conversational-ai#benchmarkRead on arxiv →
arxivApr 6

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv:2602.16746v3 Announce Type: replace Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention

TR1 model#machine-learning#optimization#generalizationRead on arxiv →
arxivApr 6

Communication-Efficient Distributed Learning with Differential Privacy

arXiv:2604.02558v1 Announce Type: new Abstract: We address nonconvex learning problems over undirected networks. In particular, we focus on the challenge of designing an algorithm that is both communication-efficient and that guarantees the privacy of the agents' data. The first goal is achieved thr

#machine-learning#optimization#privacyRead on arxiv →
arxivApr 4

How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization

arXiv:2604.01938v1 Announce Type: new Abstract: The structure of all the permutations of a sequence can be represented as a permutohedron, a graph where vertices are permutations and two vertices are linked if a swap of adjacent elements in the permutation of one of the vertices produces the permuta

#language#optimization#researchRead on arxiv →
arxivApr 2bullish

SkillRouter: Skill Routing for LLM Agents at Scale

arXiv:2603.22455v4 Announce Type: replace Abstract: Reusable skills let LLM agents package task-specific procedures, tool affordances, and execution guidance into modular building blocks. As skill ecosystems grow to tens of thousands of entries, exposing every skill at inference time becomes infeasi

SK1 model#machine-learning#benchmark#routingRead on arxiv →
arxivApr 2bullish

Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

arXiv:2604.00260v1 Announce Type: new Abstract: Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling

LA1 model#optimization#machine-learning#researchRead on arxiv →
arxivApr 2bullish

Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization

arXiv:2601.12604v2 Announce Type: replace Abstract: Policy gradient methods are known to be highly sensitive to the choice of policy parameterization. In particular, the widely used softmax parameterization can induce ill-conditioned optimization landscapes and lead to exponentially slow convergence

#machine-learning#optimization#policy-gradientRead on arxiv →
arxivApr 2

Reconsidering Dependency Networks from an Information Geometry Perspective

arXiv:2604.01117v1 Announce Type: new Abstract: Dependency networks (Heckerman et al., 2000) provide a flexible framework for modeling complex systems with many variables by combining independently learned local conditional distributions through pseudo-Gibbs sampling. Despite their computational adv

#machine-learning#research#optimizationRead on arxiv →
HomeModelsNews