MiniMax-M2.7-NVFP4 news

38 articles mentioning MiniMax-M2.7-NVFP4

arxiv1d ago

Minimax PAC Bounds for Learning in Exogenous Contextual MDPs

arXiv:2606.25170v1 Announce Type: cross Abstract: We study PAC learning in tabular discounted Markov decision processes with exogenous i.i.d. contexts, with discount factor $\gamma$, finite state space $\mathcal X$, action space $\mathcal A$, and context space $\mathcal Z$. At each time step, a cont

arxiv1d ago

Black-Box Assisted Regression: Phase Transitions and Minimax Optimality

arXiv:2606.25743v1 Announce Type: new Abstract: Foundation models are often used as fixed black-box predictors for downstream tasks with limited labeled data, but their predictions may be biased and unsafe to trust blindly. We study this setting through black-box assisted nonparametric regression: a

arxivJun 19

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

arXiv:2606.20107v1 Announce Type: new Abstract: Optimal Reinforcement Learning (RL) algorithms typically rely on carefully constructed count-based uncertainty estimates to drive exploration. Although theoretically sound, such estimates are hard to compute in practical settings and therefore offer li

arxivJun 17

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

arXiv:2602.17894v2 Announce Type: replace-cross Abstract: Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as med

arxivJun 16

Enhancing LLM Safety Through a Theoretical Minimax Game Lens

arXiv:2502.05163v2 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) necessitates effective mechanisms to ensure their responsible deployment by accurately distinguishing unsafe content from benign content. While substantial safety datasets are available in Engli

arxivJun 15

MiniMax Sparse Attention

arXiv:2606.13392v2 Announce Type: replace Abstract: Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the q

arxivJun 10

Algorithmic and Minimax Complexities in Kernel Bandits

arXiv:2606.11171v1 Announce Type: new Abstract: Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information language for freque

arxivJun 5

Minimax optimal differentially private synthetic data for smooth queries

arXiv:2602.01607v3 Announce Type: replace-cross Abstract: Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful d

arxivJun 4

Literature-Guided Minimax Optimization of Virtual Epilepsy Neurostimulation

arXiv:2606.04339v1 Announce Type: new Abstract: Computational models of epilepsy promise patient-specific treatment design, but most optimization workflows still search for parameters that perform well on average. In neuromodulation, this is a weak target: a protocol that improves the mean response

arxivJun 3

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

arXiv:2603.03480v2 Announce Type: replace Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach

arxivJun 2

Minimax-Optimal Policy Regret in Partially Observable Markov Games

arXiv:2606.02363v1 Announce Type: new Abstract: We study sequential decision-making in partially observable environments against strategic, adaptive opponents, modeled as partially observable Markov games (POMGs). The central challenge is to learn latent dynamics from partial observations while faci

arxivJun 2

Two-Fidelity Best-Action Identification for Stochastic Minimax Tree

arXiv:2606.01708v1 Announce Type: cross Abstract: We study fixed-confidence best-action identification (BAI) in stochastic minimax trees. This problem is increasingly relevant in modern AI planning, where deep minimax search and Monte Carlo Tree Search (MCTS) with language model long rollouts face a

arxivMay 28

Learning What to Recommend: Minimax Optimal Simple Regret in Logistic Bandits

arXiv:2601.21167v2 Announce Type: replace Abstract: We study stochastic logistic bandits with $d$-dimensional action features under the simple-regret objective, where a learner uses $T$ rounds of exploration to output a single final action. The logistic structure is essential here: because the infor

arxivMay 28

Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

arXiv:2605.27834v1 Announce Type: new Abstract: We study the transfer of rewards learned using inverse reinforcement learning from expert demonstrations in one environment to reinforcement learning in a new, different environment. This arises naturally when demonstrations are collected in a controll

arxivMay 27

A first-order method for constrained nonconvex-nonconcave minimax optimization

arXiv:2510.01168v3 Announce Type: replace-cross Abstract: We study a class of constrained nonconvex-nonconcave minimax optimization problems in which the inner maximization involves potentially complex constraints. Under the assumption that the inner problem of a novel lifted minimax reformulation s

arxivMay 27

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

arXiv:2605.26494v1 Announce Type: new Abstract: We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated p

arxivMay 26

Minimax Limits of k-Fold Cross-Validation via Majority

arXiv:2605.25859v1 Announce Type: cross Abstract: We study the mean-squared error of $k$-fold cross-validation as a risk estimator, with particular emphasis on how its accuracy depends on the number of folds $k$. Despite the widespread use of cross-validation, principled guidance for choosing $k$ is

arxivMay 21

Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

arXiv:2605.19768v1 Announce Type: new Abstract: We study reinforcement learning for episodic Markov Decision Processes (MDPs) whose transitions are modelled by a multinomial logistic (MNL) model. Existing algorithms for MNL mixture MDPs yield a regret of $\smash{\tilde{O}(dH^2\sqrt{T})}$ (Li et al.,

arxivMay 20

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

arXiv:2507.01932v2 Announce Type: replace-cross Abstract: We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-

arxivMay 20

Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

arXiv:2601.15014v2 Announce Type: replace-cross Abstract: We study in-context learning for nonparametric regression with $\alpha$-H\"older smooth regression functions, for some $\alpha>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer

arxivMay 13

Minimax Rates and Spectral Distillation for Tree Ensembles

arXiv:2605.11841v1 Announce Type: cross Abstract: Tree ensembles such as random forests (RFs) and gradient boosting machines (GBMs) are among the most widely used supervised learners, yet their theoretical properties remain incompletely understood. We adopt a spectral perspective on these algorithms

arxivMay 12

Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels

arXiv:2509.20294v4 Announce Type: replace Abstract: We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $\sigma^2$. Th

arxivMay 12

Extended Wasserstein-GAN Approach to Causal Distribution Learning: Density-Free Estimation and Minimax Optimality

arXiv:2605.10206v1 Announce Type: cross Abstract: Distributional causal inference requires estimating not only average treatment effects but also interventional outcome distributions, including quantiles, tail risks, and policy-dependent uncertainty. As a method for distributional causal inference,

arxivMay 11

The Minimax Rate of Second-Order Calibration

arXiv:2605.07808v1 Announce Type: new Abstract: We characterize the minimax rate of estimating the second-order calibration error for binary classification, which quantifies whether a higher-order predictor's epistemic-uncertainty estimate matches the conditional variance of the label probability on

arxivMay 11

Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

arXiv:2605.08006v1 Announce Type: cross Abstract: We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and mi

arxivMay 8

ConquerNet: Convolution-Smoothed Quantile ReLU Neural Networks with Minimax Guarantees

arXiv:2605.06265v1 Announce Type: cross Abstract: Quantile regression is a fundamental tool for distributional learning but poses significant optimization challenges for deep models due to the non-smoothness of the pinball loss. We propose ConquerNet, a class of \textbf{con}volution-smoothed \textbf

arxivApr 29

Minimax Generalized Cross-Entropy

arXiv:2603.19874v3 Announce Type: replace-cross Abstract: Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, gen

arxivApr 24

On the Stability and Generalization of First-order Bilevel Minimax Optimization

arXiv:2604.20115v1 Announce Type: new Abstract: Bilevel optimization and bilevel minimax optimization have recently emerged as unifying frameworks for a range of machine-learning tasks, including hyperparameter optimization and reinforcement learning. The existing literature focuses on empirical eff

arxivApr 23

Formal Verification of Minimax Algorithms

arXiv:2509.20138v2 Announce Type: replace Abstract: Minimax-based search algorithms with alpha-beta pruning and transposition tables are a central component of classical game-playing engines and remain widely used in practice. Despite their widespread use, these algorithms are subtle, highly optimiz

arxivApr 17

Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence

arXiv:2604.13414v1 Announce Type: cross Abstract: Majority-vote ensembles achieve variance reduction by averaging over diverse, approximately independent base learners. When training data exhibits Markov dependence, as in time-series forecasting, reinforcement learning (RL) replay buffers, and spati

arxivApr 14

Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression

arXiv:2604.10814v1 Announce Type: new Abstract: We study online covariance matrix estimation for Polyak--Ruppert averaged stochastic gradient descent (SGD). The online batch-means estimator of Zhu, Chen and Wu (2023) achieves an operator-norm convergence rate of $O(n^{-(1-\alpha)/4})$, which yields

arxivApr 13

GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

arXiv:2506.03074v5 Announce Type: replace-cross Abstract: We present `GL-LowPopArt`, a novel Catoni-style estimator for generalized low-rank trace regression. Building on `LowPopArt` (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni estimatio

arxivApr 9

Nonparametric Instrumental Regression via Kernel Methods is Minimax Optimal

arXiv:2411.19653v2 Announce Type: replace-cross Abstract: We study the kernel instrumental variable (KIV) algorithm, a kernel-based two-stage least-squares method for nonparametric instrumental variable regression. We provide a convergence analysis covering both identified and non-identified regimes

arxivApr 7

Minimaxity and Admissibility of Bayesian Neural Networks

arXiv:2604.04673v1 Announce Type: cross Abstract: Bayesian neural networks (BNNs) offer a natural probabilistic formulation for inference in deep learning models. Despite their popularity, their optimality has received limited attention through the lens of statistical decision theory. In this paper,

arxivMar 31

Mitigating Backdoor Attacks in Federated Learning Using PPA and MiniMax Game Theory

arXiv:2603.28652v1 Announce Type: new Abstract: Federated Learning (FL) is witnessing wider adoption due to its ability to benefit from large amounts of scattered data while preserving privacy. However, despite its advantages, federated learning suffers from several setbacks that directly impact the

arxivMar 31

The Minimax Lower Bound of Kernel Stein Discrepancy Estimation

arXiv:2510.15058v3 Announce Type: replace-cross Abstract: Kernel Stein discrepancies (KSDs) have emerged as a powerful tool for quantifying goodness-of-fit over the last decade, featuring numerous successful applications. To the best of our knowledge, all existing KSD estimators with known rate achi

arxivMar 31

Water-Filling is Universally Minimax Optimal

arXiv:2603.26893v1 Announce Type: cross Abstract: Allocation of dynamically-arriving (i.e., online) divisible resources among a set of offline agents is a fundamental problem, with applications to online marketplaces, scheduling, portfolio selection, signal processing, and many other areas. The wate

huggingfaceOct 30

Aligning to What? Rethinking Agent Generalization in MiniMax M2

MiniMax-M2.7-NVFP4 news

38 articles mentioning MiniMax-M2.7-NVFP4

arxiv1d ago