·

Home
Models
News
Compare
Boards
Pricing
About
Newsletter
Methodology
Contact

Latest

Cursor makes its biggest India push yet ahead of SpaceX acquisition with localized pricing4h◆Photonic reservoir computing with complex networks4h◆XS-VLA: Coupling Coarse-grained Spatial Distillation with Latent Flow Matching for Lightweight Robotic Control4h◆Agentic Permissions Policy Algebra for Taint Confinement in LLM Agents4h◆Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks4h◆The One-Word Census: Answer-Choice Conformity Across 44 Language Models4h◆Creative Integration: A Decidable Criterion of Creativity4h◆BERT-based Models vs. Large Language Models for Low-Resource Named Entity Recognition: A Comparative Study on Marathi4h◆Joint Optimization for Greedy Longest-match Tokenization4h◆Kimi K3: Open Frontier Intelligence4h◆The Few-shot Dilemma: Over-prompting Large Language Models4h◆Speculative Pipeline Decoding: Higher-Accuracy Drafting with Hidden Latency via Pipeline Parallelism4h◆Bayesian Complete-Pooling in Cross-Subject Classification for Motor Imagery Electroencephalogram4h◆StageGuard: Physiologically Constrained Sleep Staging4h◆Soft-Constrained Optimization of Latent Space in Variational Autoencoders4h◆Beyond Error-vs-Discard Characteristic: Toward Stable and Reliable Evaluation for Face Image Quality Assessment4h◆Analyzing the Importance of Blank for CTC-Based Knowledge Distillation4h◆Predicting Channel Closures in the Lightning Network with Machine Learning4h◆Evaluation of Blood Vessel Segmentation Methods on Hard-to-Detect Vascular Structures4h◆MOCA: A Transformer-based Modular Causal Inference Framework with One-way Cross-attention and Cutting Feedback4h◆Cursor makes its biggest India push yet ahead of SpaceX acquisition with localized pricing4h◆Photonic reservoir computing with complex networks4h◆XS-VLA: Coupling Coarse-grained Spatial Distillation with Latent Flow Matching for Lightweight Robotic Control4h◆Agentic Permissions Policy Algebra for Taint Confinement in LLM Agents4h◆Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks4h◆The One-Word Census: Answer-Choice Conformity Across 44 Language Models4h◆Creative Integration: A Decidable Criterion of Creativity4h◆BERT-based Models vs. Large Language Models for Low-Resource Named Entity Recognition: A Comparative Study on Marathi4h◆Joint Optimization for Greedy Longest-match Tokenization4h◆Kimi K3: Open Frontier Intelligence4h◆The Few-shot Dilemma: Over-prompting Large Language Models4h◆Speculative Pipeline Decoding: Higher-Accuracy Drafting with Hidden Latency via Pipeline Parallelism4h◆Bayesian Complete-Pooling in Cross-Subject Classification for Motor Imagery Electroencephalogram4h◆StageGuard: Physiologically Constrained Sleep Staging4h◆Soft-Constrained Optimization of Latent Space in Variational Autoencoders4h◆Beyond Error-vs-Discard Characteristic: Toward Stable and Reliable Evaluation for Face Image Quality Assessment4h◆Analyzing the Importance of Blank for CTC-Based Knowledge Distillation4h◆Predicting Channel Closures in the Lightning Network with Machine Learning4h◆Evaluation of Blood Vessel Segmentation Methods on Hard-to-Detect Vascular Structures4h◆MOCA: A Transformer-based Modular Causal Inference Framework with One-way Cross-attention and Cutting Feedback4h◆

Source

arXiv

100 articles indexed from arXiv

arxiv1d ago

A Consensus-Based Framework for Relative Preference Evaluation of Large Language Models

arXiv:2607.21632v1 Announce Type: new Abstract: Traditional benchmarks for LLMs primarily rely on static datasets and objective scoring metrics, which often fail to capture differences in response quality when multiple answers are acceptable. In such settings, correctness alone is insufficient to di

Read on arxiv →

arxiv1d ago

Probing Latent Colombian Identity Inferences in Qwen2.5-7B with Natural Language Autoencoders

arXiv:2607.21774v1 Announce Type: new Abstract: Large language models may infer demographic attributes from subtle linguistic cues even when those attributes are not explicitly stated. This pilot study examines whether Qwen2.5-7B-Instruct internally represents Colombian identity, socioeconomic statu

Read on arxiv →

arxiv1d ago

Data Quality over Capacity: Internalizing Documents into LoRA Adapters for Closed-Book QA

arXiv:2607.21861v1 Announce Type: new Abstract: We study baking documents directly into the weights of a 4-bit Gemma-4-e4b model via LoRA, so a system can answer questions about a corpus closed-book: no retrieval and no context-window budget. Across roughly 100 training runs from single documents to

Read on arxiv →

arxiv1d ago

Enjoy Your Talk: A Human-Centered Benchmark for Multi-Turn Dialogue with Decoupled User Simulation, Target Modeling, and Judging

arXiv:2607.10428v2 Announce Type: replace Abstract: Evaluating large language models (LLMs) as multi-turn conversational partners requires probing capabilities that single-turn benchmarks miss: persona consistency, evolving intent tracking, emotional dynamics, and goal completion across many turns.

Read on arxiv →

arxiv1d ago

Multi-Mask Diffusion Language Models for Few-Step Generation

arXiv:2607.19686v2 Announce Type: replace Abstract: Masked diffusion models (MDMs) are a promising family of language generators, but achieving high-quality few-step generation remains challenging. In MDMs, all forward trajectories collapse to a single fully masked state, leaving no terminal entropy

Read on arxiv →

arxiv1d ago

Solar Open 2 Technical Report

arXiv:2607.20062v2 Announce Type: replace Abstract: We present Solar Open 2, a 250B-A15B Mixture-of-Experts language model built for long-horizon agentic tasks, scaled up from Solar Open 1 (Solar Open 100B). To hold entire agent trajectories in a single context, Solar Open 2 reaches a 1M-token windo

Read on arxiv →

arxiv1d ago

The Geometry of Personality: Activation Steering with Jungian Cognitive Functions

arXiv:2607.20803v2 Announce Type: replace Abstract: Activation steering enables control and interpretation of LLMs, yet existing work primarily models personality through static trait frameworks such as the Big Five. We investigate whether personality can instead be represented and controlled as a s

Read on arxiv →

arxiv1d ago

Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning

arXiv:2507.01551v3 Announce Type: replace-cross Abstract: Process Reinforcement Learning~(PRL) has demonstrated considerable potential in enhancing the reasoning capabilities of Large Language Models~(LLMs). However, introducing additional process reward models incurs substantial computational overh

Read on arxiv →

arxiv1d ago

H$^2$SD: Hybrid Hindsight Self-Distillation

arXiv:2607.18955v3 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) provides reliable outcome supervision for language model reasoning, but a scalar trajectory reward offers limited token-level guidance. Existing self-distillation methods add a privileged

Read on arxiv →

arxiv1d ago

LunarFM: A Shared Multimodal Representation of the Moon's Surface

arXiv:2607.22408v1 Announce Type: new Abstract: The renewed global focus on lunar exploration, driven by the prospect of in-situ resource utilization and a sustained human presence on the Moon, has created growing demand for accurate, large-scale characterization of the lunar surface. Although vast

Read on arxiv →

arxiv1d ago

Prior laundering: learned priors with inherited, undetectable overconfidence

arXiv:2607.21721v1 Announce Type: cross Abstract: Learned generative priors are increasingly used for ill-posed Bayesian inverse problems, their posterior uncertainty treated as earned from data. But training one requires truths, scarce in seismic and medical imaging, so the recourse is an archive o

Read on arxiv →

arxiv1d ago

Deep Sigma Point Processes for RCS Modeling in Spaceborne SAR Imagery

arXiv:2607.21745v1 Announce Type: cross Abstract: Radar cross-section (RCS) modeling is foundational to advancing the utility and sensitivity of spaceborne radar systems. This study introduces a deep sigma-point process (DSPP) model for predicting RCS in synthetic aperture radar (SAR) imagery using

Read on arxiv →

arxiv1d ago

Prompt as a Data Type: In-Database LLM Prompt Management and Rewriting

arXiv:2607.21756v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in database-backed applications to classify tuples, filter records using semantic predicates, extract structured attributes, and enrich query results. Yet the prompt that start these computations are

Read on arxiv →

arxiv1d ago

CausalForge: A Formally Grounded, Self-Improving Agentic Framework for Automated Research in Causal Inference

arXiv:2607.22511v1 Announce Type: cross Abstract: Automating theoretical research is constrained not only by the generation of candidate results, but also by their reliable evaluation. A common approach is to close the research loop with a large language model (LLM) reviewer. However, such reviewers

Read on arxiv →

arxiv1d ago

Quantum Spectral Model: Data Reuploading with Input-Conditioned Frequency Support

arXiv:2607.22516v1 Announce Type: cross Abstract: A central design principle in modern machine learning and artificial intelligence is to align a model's inductive bias with the structure of its input data. For matrix-valued inputs, relevant matrix-level relationships can be characterised through sp

Read on arxiv →

arxiv1d ago

Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models

arXiv:2505.23378v3 Announce Type: replace Abstract: Speaker-dependent modelling can substantially improve performance in speech-based health monitoring applications. While mixed-effect models are commonly used for such speaker adaptation, they require computationally expensive retraining for each ne

Read on arxiv →

arxiv1d ago

A Comparative Benchmark of Federated Learning Strategies for Mortality Prediction on Heterogeneous and Imbalanced Clinical Data

arXiv:2509.10517v3 Announce Type: replace Abstract: Machine learning can predict in-hospital mortality, but data privacy and the statistical heterogeneity of clinical data hamper its use. Federated Learning (FL) is privacy-preserving, yet its behavior under non-IID and imbalanced conditions needs sc

Read on arxiv →

arxiv1d ago

Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics

arXiv:2605.11017v2 Announce Type: replace Abstract: Behavioral curve modeling -- fitting parametric functions to engagement-versus-exposure data -- is standard practice in recommendation, advertising, and clinical dosing. We show that aggregation introduces a systematic distortion: Simpson's paradox

Read on arxiv →

arxiv1d ago

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

arXiv:2605.30789v3 Announce Type: replace Abstract: We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, w

Read on arxiv →

arxiv1d ago

Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning

arXiv:2511.02130v2 Announce Type: replace-cross Abstract: We propose Re-FORC, an adaptive reward prediction method that, given a query, enables prediction of the expected future rewards as a function of the number of future thinking tokens. Re-FORC trains a lightweight adapter on reasoning models, d

Read on arxiv →

arxiv1d ago

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

arXiv:2601.12238v5 Announce Type: replace-cross Abstract: In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothness. Our finite-

Read on arxiv →

arxiv1d ago

Cross-reality location privacy protection in 6G-enabled vehicular metaverses: an LLM-enhanced hybrid generative diffusion model-based approach

arXiv:2601.12311v2 Announce Type: replace-cross Abstract: The emergence of 6G-enabled vehicular metaverses enables Autonomous Vehicles (AVs) to operate across physical and virtual spaces through space-air-ground-sea integrated networks. The AVs can deploy AI agents powered by large AI models as pers

Read on arxiv →

arxiv1d ago

Conformal Anomaly Detection in Python: Moving Beyond Heuristic Thresholds with nonconform

arXiv:2605.13642v2 Announce Type: replace-cross Abstract: Most anomaly detection systems output scores rather than calibrated decisions, leaving practitioners to choose thresholds heuristically and without clear statistical interpretation. Conformal anomaly detection addresses this limitation by con

Read on arxiv →

arxiv1d ago

Do Transformers Actually Help Intrusion Detection? A Temporal Sequence Evaluation on CIC-IDS2017

arXiv:2606.11098v2 Announce Type: replace-cross Abstract: Recent deep learning approaches for network intrusion detection increasingly incorporate temporal architectures such as recurrent networks and Transformers, often reporting near-perfect performance on CIC-IDS2017. However, many existing studi

Read on arxiv →

arxiv1d ago

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

arXiv:2606.14954v4 Announce Type: replace-cross Abstract: We develop a general framework for analyzing representation costs induced by parameter-space regularizers in data-fitting methods. For an arbitrary parametric method, we define its representation cost and native function space, prove existenc

Read on arxiv →

arxiv1d ago

Local Multimodal Music Alignment from Global Supervision

arXiv:2607.10023v2 Announce Type: replace-cross Abstract: Understanding music requires understanding localized relationships across data modalities, e.g., how time in performance audio maps onto position in a score image. Yet supervision for such local correspondences is difficult to obtain-in pract

Read on arxiv →

arxiv1d ago

NexForge: Scaling Agent Capabilities through Requirement-Driven Task Synthesis for LLMs

arXiv:2607.14186v5 Announce Type: replace-cross Abstract: Scaling executable agent training data for LLM post-training is bottlenecked by substrate-bound methods that tie task generation to predefined tools, repositories, or skill graphs: expanding coverage requires manual substrate engineering, eac

Read on arxiv →

arxiv1d ago

It Depends on the Dataset: When a Brain-Encoding Model's Predicted Responses Beat Their Visual Backbone for Video Memorability

arXiv:2607.16292v3 Announce Type: replace-cross Abstract: Brain-encoding foundation models predict fMRI responses to video, audio, and text well enough to win the Algonauts 2025 challenge. We ask whether their predicted responses, obtained with no scanner, are a useful feature lens for a downstream

Read on arxiv →

arxiv1d ago

Adversarial Style Optimization: Enhancing VLM Jailbreaks by GRPO-based Stylistic Triggers Optimization

arXiv:2607.21619v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have achieved impressive performance, but their safety alignment remains vulnerable to jailbreak attacks. Existing content-based jailbreaks are often inconsistent and show unsatisfying performance against the ra

Read on arxiv →

arxiv1d ago

Agentic Evaluation of Copyright Law Compliance

arXiv:2607.21799v1 Announce Type: new Abstract: Large language model (LLM) agents increasingly perform commercial tasks that involve retrieving external content such as images and, where appropriate, reproducing that content. LLM agents should comply with the law, including copyright law. Presently,

Read on arxiv →

arxiv1d ago

Humanly: A Configurable and Traceable Environment for Human-AI Collaborative Writing

arXiv:2607.21758v1 Announce Type: new Abstract: Teachers, conference chairs, and public readers all judge writing from limited evidence, seeing only a finished document and not the process that produced it. Final text alone cannot reveal whether a document was produced through human typing, AI gener

HU1 model #writing #ai-assistance #authentication Read on arxiv →

arxiv1d ago

Khondo: A Multimodal Benchmark for Document Packet Splitting of Bangla Forms

arXiv:2607.21780v1 Announce Type: new Abstract: Document packets, multiple documents concatenated into a single file, are common in government and administrative workflows, yet splitting them into their constituent documents is difficult, especially for low-resource languages. We introduce Khondo (B

Read on arxiv →

arxiv1d ago

J-CoT: Chain-of-Thought in J-Space

arXiv:2607.21981v1 Announce Type: new Abstract: Chain-of-thought prompting improves language-model reasoning by carrying intermediate states across successive computation steps. However, relying on natural language as the only recurrent interface is overly restrictive, since many transient computati

Read on arxiv →

arxiv1d ago

Progress Reward Modeling for Robotic Learning: A Comprehensive Survey

arXiv:2607.21655v1 Announce Type: cross Abstract: Robotic learning takes place in dynamic environments with large behavior spaces. A terminal success signal only tells the robot whether the task is completed. It does not explain whether the current behavior is making progress, remaining unchanged, o

Read on arxiv →

arxiv1d ago

Learning What Matters: Supervising Sparse Attention Routing with Causal Evidence Sets

arXiv:2607.21692v1 Announce Type: cross Abstract: Sparse attention reduces the cost of long contexts by allowing each query to read only selected parts of the input. These selectors are often trained by distilling the attention patterns of a dense teacher, assuming that attention reveals which conte

Read on arxiv →

arxiv1d ago

LMEB: Long-horizon Memory Embedding Benchmark

arXiv:2603.12572v5 Announce Type: replace Abstract: Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models' ability to ha

Read on arxiv →

arxiv1d ago

WHBench: Evaluating Frontier LLMs with Expert-in-the-Loop Validation on Women's Health Topics

arXiv:2604.00024v2 Announce Type: replace Abstract: Large language models are increasingly used for medical guidance, but women's health remains under-evaluated in benchmark design. We present the Women's Health Benchmark (WHBench), a targeted evaluation suite of 47 expert-crafted scenarios across 1

Read on arxiv →

arxiv1d ago

Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness

arXiv:2605.01006v3 Announce Type: replace Abstract: Partisan news media erode cross-partisan trust, but large language models (LLMs) offer the potential of debiasing such content at scale. Across two pre-registered experiments, we tested whether LLM-generated debiasing of liberal news headlines impr

Read on arxiv →

arxiv1d ago

SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective Retrieval-Augmented Generation

arXiv:2605.03534v2 Announce Type: replace Abstract: Retrieval-augmented generation (RAG) grounds answers in retrieved passages, yet relevance does not guarantee sufficiency: a topical passage may still fail to justify the answer. We study evidence sufficiency verification for selective RAG answering

Read on arxiv →

arxiv1d ago

Hint-Guided Diversified Policy Optimization for LLM Reasoning

arXiv:2606.03021v2 Announce Type: replace Abstract: Recent developments in Large Language Models (LLMs) have showcased impressive reasoning capabilities, with Reinforcement Learning with Verifiable Rewards (RLVR) being a promising enhancement strategy. However, existing reward mechanisms are constra

Read on arxiv →

arxiv1d ago

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

arXiv:2606.27347v3 Announce Type: replace Abstract: Whether political elites organise into rent-seeking coalitions that capture public resources or civic networks that sustain governance is a central question in comparative politics. Yet observing these complex, informal, and adversarial ties at sca

Read on arxiv →

arxiv1d ago

Gemma 4 Technical Report

arXiv:2607.02770v2 Announce Type: replace Abstract: We introduce Gemma 4, a new generation of open-weight, natively multimodal language models in the Gemma model family. Designed to advance compute efficiency and reasoning, the Gemma 4 model suite features dense and Mixture-of-Experts architectures,

Read on arxiv →

arxiv1d ago

Dissociating the Internal Representations of Sycophancy in LLMs

arXiv:2607.07003v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) frequently exhibit sycophancy, agreeing with a user's statement even when it is incorrect. While often studied as a single, uniform behavior, sycophancy can manifest in substantially distinct ways across contexts,

Read on arxiv →

arxiv1d ago

Agentic coding without the cloud: evaluating open-weight large language models on longitudinal data preparation tasks

arXiv:2607.21482v2 Announce Type: replace-cross Abstract: Large language models (LLMs) and agents are now widely used tools in code development, with data typically sent to third-party cloud-based models. Their adoption in research using personal data is constrained by governance requirements that t

Read on arxiv →

arxiv1d ago

OpenForgeRL: Train Harness-native Agents in Any Environment

arXiv:2607.21557v2 Announce Type: replace-cross Abstract: Modern AI agents rely on elaborate inference harnesses such as Claude Code, Codex, and OpenClaw to drive multi-turn reasoning, tool use, and access to external systems. While powerful, these complex harnesses also make agents hard to train en

Read on arxiv →

arxiv1d ago

Smart predict-then-robustly-optimize

arXiv:2607.21773v1 Announce Type: new Abstract: In this paper, we propose and study a robust variant of the smart predict-then-optimize approach that accounts for prediction shifts due to disturbance in the covariate feature space. While traditional integrated-learning-and-optimization models assume

Read on arxiv →

arxiv1d ago

Unbiased Open World Regularization for Fair Self-Supervised Learning

arXiv:2607.22149v1 Announce Type: new Abstract: Despite recent advances, self-supervised learning (SSL) models and Joint-Embedding Predictive Architectures (JEPAs) remain susceptible to learning spurious biases in the dataset. These techniques rely on regularization, which prevents representation co

Read on arxiv →

arxiv1d ago

Susceptible Reservoir Architectures for Regime-Conditional Volatility Forecasting

arXiv:2607.22491v1 Announce Type: new Abstract: Volatility forecasting is dominated by persistence and measurement noise, leaving limited residual structure for nonlinear models to exploit. We introduce Susceptible Architectures (SUSA), a reservoir-design principle for volatility forecasting, and it

Read on arxiv →

arxiv1d ago

Dysphagia Risk Stratification in Head and Neck Cancer via Two-Stage PRO-Clinical Stacking

arXiv:2607.22514v1 Announce Type: new Abstract: Dysphagia is a debilitating late effect of head and neck cancer (HNC) treatment, yet timely identification of at-risk patients remains challenging in survivorship care. Definitive assessment relies on videofluoroscopic imaging, as captured by the Dynam

Read on arxiv →

arxiv1d ago

FBLayout: Optimizing Memory Layout for Efficient LLM Finetuning on Mobile GPUs

arXiv:2607.21624v1 Announce Type: cross Abstract: Transformer-based models have enabled unprecedented capabilities across language, vision, and multimodal tasks. On-device fine-tuning of transformer models offers a privacy-preserving path to personalized AI, yet remains inefficient on mobile GPUs du

Read on arxiv →

arxiv1d ago

Explainable quantum-compressed machine learning for complex fluid flows

arXiv:2607.21688v1 Announce Type: cross Abstract: Machine-learning surrogates of physical systems face a paradox: explainable models facing the challenge of expressivity to capture complex nonlinear flows, whereas expressive deep surrogates match high-fidelity simulations only through massive parame

Read on arxiv →

arxiv1d ago

Carpe Diem: Critical Learning Period-Aware Contract-Based Incentives for Federated Learning

arXiv:2503.07869v4 Announce Type: replace Abstract: Critical learning periods (CLPs) in federated learning (FL) refer to early stages during which low-quality contributions (e.g., sparse training data availability) can permanently impair the performance of the global model. However, existing incenti

Read on arxiv →

arxiv1d ago

Safe In-Context Reinforcement Learning

arXiv:2509.25582v4 Announce Type: replace Abstract: In-context reinforcement learning (ICRL) is an emerging RL paradigm where an agent, after pretraining, can adapt to out-of-distribution test tasks without any parameter updates, instead relying on an expanding context of interaction history. While

Read on arxiv →

arxiv1d ago

Vector-Valued Reproducing Kernel Banach Spaces for Neural Networks and Operators

arXiv:2509.26371v3 Announce Type: replace-cross Abstract: Recently, there has been growing interest in characterizing the function spaces underlying neural networks. While shallow and deep scalar-valued neural networks have been linked to scalar-valued reproducing kernel Banach spaces (RKBS), $\math

Read on arxiv →

arxiv1d ago

Wasserstein Gradient Flows for Scalable and Regularized Barycenter Computation

arXiv:2510.04602v4 Announce Type: replace-cross Abstract: Wasserstein barycenters provide a principled approach for aggregating probability measures, while preserving the geometry of their ambient space. Existing discrete methods are not because as they assume access to the complete set of samples f

Read on arxiv →

arxiv1d ago

Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation

arXiv:2510.24616v5 Announce Type: replace-cross Abstract: For four decades statistical physics has been providing a framework to analyse neural networks. A long-standing question remained on its capacity to tackle deep learning models capturing rich feature learning effects, thus going beyond the na

Read on arxiv →

arxiv1d ago

Security Without Detection: Economic Denial as a Primitive for Edge and IoT Defense

arXiv:2512.23849v2 Announce Type: replace-cross Abstract: Sophisticated attackers can evade detection-based security by using encryption, stealth tactics, and low-rate attack patterns. This challenge is particularly acute in Internet of Things (IoT) and edge environments, where limited resources mak

Read on arxiv →

arxiv1d ago

Atlas 2 -- Foundation models for clinical deployment

arXiv:2601.05148v2 Announce Type: replace-cross Abstract: Pathology foundation models substantially advanced the possibilities in computational pathology --- yet tradeoffs in terms of performance, robustness, and computational requirements remained, which limited their clinical deployment. In this r

Read on arxiv →

arxiv1d ago

Analysing Self-Harm Representations in Language Models: a Cross-Architecture Study

arXiv:2607.21988v1 Announce Type: new Abstract: Self-harm content is particularly challenging to detect using NLP techniques, and is also a high-stakes task which requires the highest accuracy to enable timely intervention or flagging at-risk users. We therefore present an analysis of how LLMs repre

Read on arxiv →

arxiv1d ago

Learning to Reason for Factuality

arXiv:2508.05618v2 Announce Type: replace Abstract: Reasoning Large Language Models (R-LLMs) have significantly advanced complex reasoning tasks but often struggle with factuality, generating substantially more hallucinations than their non-reasoning counterparts on long-form factuality benchmarks.

Read on arxiv →

arxiv1d ago

Token-Operations-Oriented Inference Optimization Techniques for Large Models

arXiv:2606.20295v2 Announce Type: replace-cross Abstract: Large model inference optimization serves as a key foundation for supporting the scalable, low-cost, and highly stable operation of large model services. Centered on token-oriented inference optimization technology, this paper proposes for th

Read on arxiv →

arxiv1d ago

Reliability Scales Inversely: Bigger Language Models Compound Mistakes Faster

arXiv:2607.18292v2 Announce Type: replace-cross Abstract: As language models scale, answers start truer but degrade faster: scaling buys capability but erodes reliability. The knowledge-gap account -- more data, retrieval, or scale -- misses an auto-regressive risk residual that increases with scale

Read on arxiv →

arxiv1d ago

On the Depth Scalability of Logic Gate Networks

arXiv:2607.21633v1 Announce Type: new Abstract: Logic Gate Networks (LGNs) implement computation through compositions of Boolean operations, yet unlike classical Boolean circuits, existing LGNs do not reliably benefit from increased depth. We identify two distinct causes: optimization collapse in de

Read on arxiv →

arxiv1d ago

Measuring the Dependency Gap: Diagnosing Inter-Column Fidelity in Tabular Generative Models

arXiv:2607.21636v1 Announce Type: new Abstract: Synthetic tabular data is valued for preserving not only each column's marginal distribution but the dependencies between columns -- structure that carries much of the discriminative signal for minority classes in imbalanced domains such as fraud and c

Read on arxiv →

arxiv1d ago

Multi-Horizon Consistency as Geometry: When Latent Dynamics Contract, and When They Do Not

arXiv:2607.21645v1 Announce Type: new Abstract: Multi-horizon latent consistency is a common training knob in video predictors and world models, but practitioners rarely know what it does to transition geometry. We treat lambda, the weight on multi-step latent agreement, as a diagnostic control and

Read on arxiv →

arxiv1d ago

Adjustment Speed as a Safety Constraint for Nonstationary Reinforcement Learning

arXiv:2607.21646v1 Announce Type: new Abstract: Ensuring safety in reinforcement learning under nonstationarity requires determining whether a learning system can safely adapt to forecasted environmental change within the required recovery horizon. Existing safe reinforcement learning methods typica

Read on arxiv →

arxiv1d ago

Shallower ReLU Network Representations via Exact Linear Algebra

arXiv:2607.21651v1 Announce Type: new Abstract: We prove that the maximum of $n$ real numbers is exactly representable by a ReLU network with two hidden layers for every $n\le 10$. The constructions are obtained by reducing the problem to exact rational linear algebra: after a symmetry reduction, th

Read on arxiv →

arxiv1d ago

Bounding the Causal Impact of ML-assisted Decision-Making via Counterfactual Correctness

arXiv:2607.21806v1 Announce Type: new Abstract: Predictive machine learning (ML) models are increasingly used to aid human decision-makers across various high-risk domains such as healthcare and criminal justice. There is a growing recognition of the need to evaluate the causal impact of deploying t

Read on arxiv →

arxiv1d ago

Data eccentricity, asymptotics of Gaussian RBF reproducing kernel Hilbert space, and kernel PCA

arXiv:2607.21823v1 Announce Type: new Abstract: We show that, up to isotropic scaling, the Gaussian RBF reproducing kernel Hilbert space (RKHS) is asymptotically isometric to Euclidean space in the large bandwidth limit. This strongly suggests that kernel-based constructions reliant on metric proper

Read on arxiv →

arxiv1d ago

A Graph-Based Control Interface for Traffic Signals on Heterogeneous Road Networks

arXiv:2607.21831v1 Announce Type: new Abstract: We present a traffic-signal control interface in which a shared graph neural network assigns scores to individual traffic movements. Each junction converts these scores into its own variable-sized set of legal signal phases using a deterministic incide

Read on arxiv →

arxiv1d ago

Searching the Space of Feed-Forward Neural-Network Weight-Update Rules with Fixed Depth Symbolic Regression

arXiv:2607.21855v1 Announce Type: new Abstract: We investigate whether symbolic regression can discover explicit neural network weight-update rules that outperform standard hand-designed optimizers on small symbolic regression benchmarks. Candidate update rules are represented as fixed-depth symboli

Read on arxiv →

arxiv1d ago

A Leakage-Free Stacked Ensemble Method for Multiclass Classification

arXiv:2607.22081v1 Announce Type: new Abstract: Multiclass classification is a fundamental problem across a wide range of domains. It is still challenging due to possession of high inter-class similarity, class imbalance datasets, and variability in data distributions. Rule-based classifiers such as

Read on arxiv →

arxiv1d ago

Phylogenetic signal in marine mammal and bird vocalizations captured by audio foundation models: the limited benefit of domain-specific pretraining

arXiv:2607.22458v1 Announce Type: new Abstract: Do learned audio embeddings encode structure that nobody told them to encode? We probe four large pretrained audio models (AST, CLAP, BEATs-bio and BirdNET) with a downstream task none of them saw during training: recovering phylogenetic distance from

Read on arxiv →

arxiv1d ago

Generative and multimodal AI for materials prediction and design: Progress, challenges, and perspectives

arXiv:2607.21660v1 Announce Type: cross Abstract: Artificial intelligence (AI) is accelerating materials prediction and design by enabling efficient exploration of chemical and structural spaces, with particular promise for novel materials discovery. However, novelty in materials discovery encompass

Read on arxiv →

arxiv1d ago

Ordered Action Tokens for Visuomotor Policy Learning

arXiv:2607.21670v1 Announce Type: cross Abstract: Action tokenization maps continuous robot action chunks to discrete tokens and has become an important interface for modern visuomotor policies. Existing approaches either rely on analytical discretization methods that produce prohibitively long toke

Read on arxiv →

arxiv1d ago

MemNMF: Memory-Augmented NMF on LPC Spectra for Anomalous Sound Detection

arXiv:2607.22086v1 Announce Type: cross Abstract: Autoencoder-based anomalous sound detection is attractive for machine condition monitoring because it can be trained using only normal recordings and yields an interpretable anomaly score from reconstruction error. Most prior work uses spectrogram au

Read on arxiv →

arxiv1d ago

PostDeg: Placement Beats Parameterization in LayerNorm GNNs

arXiv:2606.14022v2 Announce Type: replace Abstract: LayerNorm-based GNNs routinely erase the topology signals (degree, centrality, $k$-core) that node-selection policies should depend on, but the literature has not located where in the residual block the erasure happens. We answer that question: a p

Read on arxiv →

arxiv1d ago

AI-Driven Surrogate Models for Predicting Electrode-Scale Discharge Behavior in Lithium-Ion Batteries

arXiv:2607.20577v2 Announce Type: replace Abstract: Physics-based simulations are essential for understanding the electrode-scale discharge behavior of lithium-ion batteries (LIBs) but suffer from prohibitive computational costs. To address this, we introduce a novel deep learning surrogate pipeline

Read on arxiv →

arxiv1d ago

Embodiment-Induced Coordination Regimes in Tabular Multi-Agent Q-Learning

arXiv:2601.17454v2 Announce Type: replace-cross Abstract: Centralized value learning underlies a broad class of multi-agent reinforcement learning methods, but its claimed advantage is typically evaluated in settings that confound coordination structure with function approximation and partial observ

Read on arxiv →

arxiv1d ago

The pretraining domain outweighs the training objective in setting the privacy-utility trade-off of differentially private medical image analysis

arXiv:2601.19618v2 Announce Type: replace-cross Abstract: Differential privacy protects the patients whose images train medical imaging models, but it lowers diagnostic accuracy, and the initialization is the strongest known remedy. Practice increasingly favors large generic self-supervised encoders

Read on arxiv →

arxiv1d ago

Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases

arXiv:2602.09572v3 Announce Type: replace-cross Abstract: The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction

Read on arxiv →

arxiv1d ago

Skill Self-Play: Pushing the Frontier of LLM Capability with Co-Evolving Skills

arXiv:2607.22529v1 Announce Type: new Abstract: LLM training is shifting from manual design and annotation to interaction-driven self-evolution. However, existing self-evolutionary methods face a fundamental dilemma between task diversity and verification reliability: environment-bound methods obtai

Read on arxiv →

arxiv1d ago

Language-Aware Distillation for Multilingual Instruction-Following Speech LLMs with ASR-Only Supervision

arXiv:2603.07025v2 Announce Type: replace Abstract: Speech Large Language Models (LLMs) that understand and follow instructions in many languages are useful for real-world interaction, but are difficult to train with supervised fine-tuning, requiring large, task-specific speech corpora. While recent

Read on arxiv →

arxiv1d ago

Neural Feature Governance: Extending Atom Prevalence

arXiv:2607.21671v1 Announce Type: new Abstract: Neural network compression and interpretability remain open challenges in modern deep learn- ing, where billion-parameter architectures deliver impressive accuracy at the cost of trans- parency, computational efficiency, and reliable uncertainty quanti

Read on arxiv →

arxiv1d ago

Self-Poisoning in Adaptive Out-of-Distribution Detection: A Sharp-Threshold Theory and Certified Label-Free Calibration

arXiv:2607.21673v1 Announce Type: new Abstract: Test-time adaptive out-of-distribution (OOD) detectors update a memory bank from the unlabelled stream. We show this adaptation obeys a provable dynamical law. Modelling bank impurity as a generalized P\'olya urn, we prove almost-sure convergence to a

Read on arxiv →

arxiv1d ago

Autoregressive EHR Foundation Models with Multimodal Inputs

arXiv:2607.22264v1 Announce Type: new Abstract: Autoregressive foundation models trained on tokenized electronic health records (EHRs) can support zero-shot clinical prediction, yet most operate on structured event codes alone, and do not incorporate multiple modalities in a principled way. We prese

Read on arxiv →

arxiv1d ago

SCOPE and SCION: A Benchmark and an Auditable Reference Pipeline for Schema Induction and Fusion from Text

arXiv:2607.21610v1 Announce Type: cross Abstract: Schema graphs are an upstream bottleneck of schema-grounded information extraction and knowledge graph construction, yet most extraction systems assume the schema is already available. We introduce SCOPE (Schema Construction and Ontology-induction Pi

Read on arxiv →

arxiv1d ago

CARDIAG: A Dense Segment Classification Benchmark of Deep Learning Architectures for Coronary Angiography

arXiv:2607.22139v1 Announce Type: cross Abstract: Accurate pixel-level classification of coronary angiograms is critical for cardiovascular disease assessment, yet the field lacks standardized evaluation protocols. In this work we demonstrate a new benchmark for the assessment of deep learning model

Read on arxiv →

arxiv1d ago

An Empirical Study of OpenPangu Quantization on Ascend NPUs

arXiv:2606.21257v3 Announce Type: replace Abstract: OpenPangu models are attractive targets for private and domestic large-language-model deployment, yet their robustness under aggressive post-training quantization on Ascend NPUs has not been systematically characterized. This paper conducts a contr

Read on arxiv →

arxiv1d ago

A Linear Matching Bandit Approach to Online Multi-Human Multi-Robot Teaming

arXiv:2606.29221v2 Announce Type: replace Abstract: We address the problem of online multi-human multi-robot matching through the lens of a linear matching bandit framework, where a learner assigns robots with unknown features from a fixed pool to distinct sets of human agents over multiple rounds.

LI1 model #machine-learning #matching #optimization Read on arxiv →

arxiv1d ago

A Structural Interpretation of GELU and Threshold-Transmission Activations via the First-Order Loss Function

arXiv:2607.03664v3 Announce Type: replace Abstract: The Gaussian Error Linear Unit is usually motivated as the expected output of an input-dependent Bernoulli gate. This work gives an alternative interpretation: GELU is the expected output of a hard linear gate with a Gaussian random threshold. This

Read on arxiv →

arxiv1d ago

The Computational Basis of Confidence in Large Language Models

arXiv:2607.12447v2 Announce Type: replace Abstract: Reliable confidence -- the probability that a model's own answer is correct -- is essential for the trustworthy deployment of language models. Existing work has largely evaluated confidence by how well it predicts correctness and whether it is cali

Read on arxiv →

arxiv1d ago

Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression

arXiv:2506.18748v2 Announce Type: replace-cross Abstract: We consider resource allocation problems in multi-user wireless networks, where the goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users. We demonstrate how a state-augmented g

Read on arxiv →

arxiv1d ago

Breaking the Data Barrier in Learning Symbolic Computation: A Case Study on Variable Ordering Suggestion for Cylindrical Algebraic Decomposition

arXiv:2601.13731v2 Announce Type: replace-cross Abstract: Symbolic computation, powered by modern computer algebra systems, has important applications in mathematical reasoning through exact deep computations. The efficiency of symbolic computation is largely constrained by such deep computations in

Read on arxiv →

arxiv1d ago

Scalable Gaussian process inference via neural feature maps

arXiv:2605.10285v2 Announce Type: replace-cross Abstract: We present a theoretically grounded Gaussian process framework that leverages neural feature maps to construct expressive kernels. We show that the learned feature map can be interpreted as an optimal low-rank approximation to a Gram matrix d

Read on arxiv →

arxiv1d ago

Scaling Native Multimodal Pre-Training From Scratch

arXiv:2607.22043v1 Announce Type: new Abstract: Although large language models (LLMs) exhibit remarkable reasoning capabilities, their reliance on text-only pre-training restricts the perception of the multimodal physical world. Native multimodal pre-training avoids this limitation by training model

Read on arxiv →

arxiv1d ago

Nanbeige4.2-3B: Unlocking Agentic Capabilities in a Compact Mode

arXiv:2607.22083v1 Announce Type: cross Abstract: We present Nanbeige4.2-3B, a compact general agentic model with 3B non-embedding parameters. It delivers strong performance across code-agent, office-agent, and complex tool-use tasks while maintaining highly competitive reasoning capabilities in mat

Read on arxiv →

arxiv1d ago

Opaque Epistemic Mediation: How LLM Deployment Configurations Shape the Validation of Pseudo-Science

arXiv:2607.22513v1 Announce Type: cross Abstract: Commercial large language models are increasingly used as knowledge references, yet their stance on contested scientific claims is neither stable nor transparent. We tested how four major LLM families (Claude, Grok, GPT, Gemini) evaluate ethnonationa

Read on arxiv →

arxiv1d ago

When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas

arXiv:2505.19212v2 Announce Type: replace Abstract: Recent advances in LLMs have enabled their use in complex agentic roles, involving decision-making with humans or other agents, making ethical alignment a critical concern. While prior work has examined LLMs' moral judgment and strategic behavior s

Read on arxiv →

arxiv1d ago

Interpretable Depression Detection from Social Media Text Using LLM-Derived Embeddings

arXiv:2506.06616v2 Announce Type: replace Abstract: Accurate and interpretable detection of depressive language in social media can support early identification of mental health conditions and inform timely interventions. In this paper, we investigate the use of large language models (LLMs) and trad

Read on arxiv →

Home Models News