arxiv4d ago

Incomplete Prompt Jailbreaks in Large Language Models

arXiv:2607.20473v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly released as open-weight models with safeguards against harmful requests. Nevertheless, sentence completion remains vulnerable to incomplete harmful prompts. In this work, we formalize this phenomenon as inc

#safety #open-source #language-models Read on arxiv →

arxiv5d agobullish

Generative Augmented Inference of LLM-generated Data for Market Research: Theory and Empirical Evidence

arXiv:2604.14575v3 Announce Type: replace-cross Abstract: Marketing research often relies on parameters estimated from costly human-generated data, such as conjoint survey responses, purchase decisions, and field experiment outcomes. Recent advances in large language models (LLMs) and other AI syste

LA1 model #machine-learning #marketing #research Read on arxiv →

arxiv5d ago

Spectral-transport stability and benign overfitting for minimum norm interpolation

arXiv:2604.08625v3 Announce Type: replace-cross Abstract: Benign overfitting describes the ability of minimum norm interpolating estimators to generalize despite fitting noisy data exactly. Existing characterizations depend on delicate spectral functionals of the population covariance operator, name

#machine-learning #research #statistics Read on arxiv →

arxiv5d agobullish

SwiftRepertoire: Few-Shot Immune-Signature Synthesis via Dynamic Kernel Codes

arXiv:2602.01051v5 Announce Type: replace Abstract: Repertoire-level analysis of T cell receptors offers a biologically grounded signal for disease detection and immune monitoring, yet practical deployment is impeded by label sparsity, cohort heterogeneity, and the computational burden of adapting l

#machine-learning #research #biological Read on arxiv →

arxiv5d ago

Are Attributions of Consciousness to AI Chatbots Epistemically Innocent?

arXiv:2607.20001v1 Announce Type: cross Abstract: Artificial intelligence (AI) chatbots (e.g., ChatGPT) can communicate in strikingly humanlike ways. This has prompted many chatbot users to attribute psychological properties, including consciousness, to these systems. However, there is little scient

#open-source #community #collaboration Read on arxiv →

arxivJul 16

A Shared Subcircuit Lets LLMs Count Down Across Tasks

arXiv:2607.12279v1 Announce Type: new Abstract: Writing a sentence of exactly twelve words; ending a DNA sequence at the right codon; formatting an ASCII table. These are all tasks that language models can do that requires tracking how many tokens remain before a target. In this work, we identify in

ME1 model #language-models #research #neural-networks Read on arxiv →

arxivJul 13

iLENS: Interpretable LLM-Guided Mixture-of-Experts for Neuroimaging Survival Analysis

arXiv:2607.08778v1 Announce Type: cross Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder that continues to impact millions of people worldwide. Predicting AD conversion during the prodromal stage remains critical for disease understanding and patient care. As such, survival

IL1 model #machine-learning #artificial-intelligence #healthcare Read on arxiv →

arxivJul 11bullish

Svarna: An Open Corpus Workbench for Modern Greek

arXiv:2607.00970v5 Announce Type: replace Abstract: This paper introduces Svarna, a free, open-source, web-based corpus workbench for modern Greek. Svarna integrates five databases covering various registers, institutional, literary, dialectal, social media, and historical, to provide a total of mor

#open-source #language-technology #corpus Read on arxiv →

arxivJul 10

($\theta_l, \theta_u$)-Parametric Multi-Task Optimization: Joint Search in Solution and Infinite Task Spaces

arXiv:2503.08394v5 Announce Type: replace-cross Abstract: Multi-task optimization is typically characterized by a fixed and finite set of tasks. The present paper relaxes this condition by considering a non-fixed and potentially infinite set of optimization tasks defined in a parameterized, continuo

#optimization #machine-learning #evolutionary-computing Read on arxiv →

arxivJul 10

Persona Cartography: Charting Language Model Personality Traits in Weight Space

arXiv:2607.07916v1 Announce Type: new Abstract: Large language models exhibit recurring behavioural patterns -- personas -- that shape generalisation and safety, but we lack reliable tools for decomposing, measuring, and controlling them. Our central insight is to treat personas as positions in a sp

#safety #personality #benchmark Read on arxiv →

arxivJul 10

Stochastic Order Learning: An Approach to Rank Estimation Using Noisy Data

arXiv:2607.08103v1 Announce Type: new Abstract: Rank estimation under label noise poses a fundamental challenge, as ordinal annotations often exhibit structured uncertainty rather than simple label corruption. In this paper, we reformulate rank estimation with noisy ordinal labels as a stochastic or

#machine-learning #research #label-noise Read on arxiv →

arxivJul 2bullish

Neural Certificate Pricing for Combinatorial Optimization Problems

arXiv:2607.01185v1 Announce Type: new Abstract: Combinatorial optimization (CO) problems are difficult because certifiable discrete structure induces exponential search. One needs to search over the set exponentially many candidates to certify optimality, however, the structural feasibility of a pat

NE1 model #optimization #machine-learning #research Read on arxiv →

arxivJul 1

From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary

arXiv:2506.17294v3 Announce Type: replace-cross Abstract: The advent of artificial intelligence has propelled AI-Generated Game Commentary (AI-GGC) into a rapidly expanding research area, offering advantages such as scalable availability and personalized narration. However, existing studies remain f

#research #survey #game-commentary Read on arxiv →

arxivJul 1bullish

Rethinking Garment Conditioning in Diffusion-based Virtual Try-On: Decouple, Don't Denoise

arXiv:2511.18775v2 Announce Type: replace-cross Abstract: Virtual Try-On (VTON) synthesizes realistic images of a person wearing a target garment, with broad applications in e-commerce and fashion. Diffusion-based dual-UNet methods achieve strong results but double the parameters by dedicating a sep

DEDUUN4 models · +1 #computer-vision #research #state-of-the-art Read on arxiv →

arxivJul 1bullish

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

arXiv:2606.31134v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated exceptional capabilities in mathematical reasoning, they frequently produce subtle errors that evade human detection. Formal mathematical languages like Lean 4 offer mechanical proof checking, strong

LA1 model #autoformalization #mathematics #proof-checking Read on arxiv →

arxivJun 27

Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols

arXiv:2606.26203v1 Announce Type: new Abstract: As AI agent protocols proliferate, the governance structures shaping their interoperability standards remain empirically underexamined. We introduce an LLM-powered comparative pipeline for large-scale governance discourse analysis, integrating automate

LL1 model #governance #interoperability #artificial-intelligence Read on arxiv →

arxivJun 27

VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image

arXiv:2602.04349v3 Announce Type: replace-cross Abstract: 3D editing has emerged as a critical research area to provide users with flexible control over 3D assets. While current editing approaches predominantly focus on 3D Gaussian Splatting or multi-view images, the direct editing of 3D meshes rema

VEVO2 models #computer-vision #3d-editing #mesh-editing Read on arxiv →

arxivJun 25bearish

Riemann-Bench: A Benchmark for Moonshot Mathematics

arXiv:2604.06802v3 Announce Type: replace Abstract: Recent AI systems have achieved gold-medal-level performance on the International Mathematical Olympiad, demonstrating remarkable proficiency at competition-style problem solving. However, competition mathematics represents only a narrow slice of m

#mathematics #benchmark #research Read on arxiv →

arxivJun 25bullish

From Meta Idea to Advanced Mathematical Discovery -- Human-AI Co-Discovery of Sign-Embedding Quantum Algorithms

arXiv:2606.24899v1 Announce Type: new Abstract: AI-assisted mathematics is often evaluated on solving predefined problems. In practice, however, many important advances begin earlier, when a vague research intuition is transformed into a concrete problem, a promising route, and a theorem family wort

AI1 model #research #quantum #collaboration Read on arxiv →

arxivJun 20

Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text

arXiv:2606.19637v1 Announce Type: cross Abstract: Clinical NLP increasingly relies on electronic health record (EHR) data to detect suicidal behaviors, treating clinical documentation as more reliable ground truth than social media. We argue that this framing obscures how EHR-based suicidality datas

#clinical-nlp #mental-health #data-quality Read on arxiv →

arxivJun 19bullish

VIMPO: Value-Implicit Policy Optimization for LLMs

arXiv:2606.20008v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative methods such as GRPO av

VIGRPP3 models #reinforcement-learning #language-models #optimization Read on arxiv →

arxivJun 17

Learning in Matching Games with Bandit Feedback

arXiv:2506.03802v2 Announce Type: replace Abstract: We introduce a learning problem in a generalized two-sided matching market, where agents select actions to interact with their match. Specifically, we consider a setting in which matched agents engage in zero-sum games with initially unknown payoff

#machine-learning #game-theory #algorithm Read on arxiv →

arxivJun 16

Towards CONUS-Wide ML-Augmented Conceptually-Interpretable Modeling of Catchment-Scale Precipitation-Storage-Runoff Dynamics

arXiv:2510.02605v2 Announce Type: replace Abstract: While many modern studies are dedicated to ML-based large-sample hydrologic modeling, these efforts have not necessarily translated into predictive improvements that are grounded in enhanced physical-conceptual understanding. Here, we report on a C

MALO2 models #machine learning #hydrology #modeling Read on arxiv →

arxivJun 15

Multi-component Causal Tracing in Large Language Models

arXiv:2606.03085v2 Announce Type: replace-cross Abstract: Causal tracing systematically intervenes on a large language model's (LLM's) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM's b

#machine-learning #research #language-models Read on arxiv →

arxivJun 12bullish

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

arXiv:2606.11199v1 Announce Type: cross Abstract: We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather than targeting benchma

NICLNO3 models #research #competition #language-models Read on arxiv →

arxivJun 11

Skill-Augmented AI Agents for Medical Research Analysis: An Exploratory Multi-Model Human Evaluation in an NSCLC Transcriptomic Biomarker Task

arXiv:2606.11830v1 Announce Type: new Abstract: Background. Large language models and AI agents are increasingly used to support biomedical research, but native model outputs may omit key analytical steps, misuse methods, or overstate conclusions. We evaluated whether autonomous access to a medical

OP1 model #biomedical #research #evaluation Read on arxiv →

arxivJun 10

Bidirectional Random Projections

arXiv:2606.10377v1 Announce Type: cross Abstract: This paper analyzes bidirectional random projections for ordinary least squares (OLS) regression under the fixed design setting. Let $(X,Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$ be a sample and $R \in \mathbb{R}^{n_1 \times n}, W \in \math

#statistics #machine-learning #research Read on arxiv →

arxivJun 6bullish

MPCoT: Reward-Guided Multi-Path Latent Reasoning for Test-Time Scalable Vision-Language-Action

arXiv:2606.06245v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) policies remain brittle in long-horizon and high-uncertainty control, where one-pass action decoding provides limited inference-time deliberation. Explicit chain-of-thought can increase reasoning depth, but introduces tok

MP1 model #robotics #artificial-intelligence #research Read on arxiv →

arxivJun 2

How Much Orthogonalization Does Muon Need?

arXiv:2606.00371v1 Announce Type: new Abstract: Muon optimizers improve neural-network training by replacing ill-conditioned momentum updates with approximately semi-orthogonal updates. This motivates a practical question: how much orthogonalization does Muon actually require? We study this question

NAGPMA4 models · +1 #machine-learning #optimization #neural-networks Read on arxiv →

arxivMay 29bullish

Extreme dynamic symmetry enables omnidirectional and multifunctional robots

arXiv:2605.29254v1 Announce Type: cross Abstract: Symmetry is a central organizing principle in natural systems, yet its use as a unifying design strategy in robotics has largely remained limited to geometric form. We show that symmetry can instead be leveraged at the level of dynamic actuation capa

#robotics #artificial-intelligence #research Read on arxiv →

arxivMay 28bullish

Smaller, Younger, and More Impactful: How AI-Assisted Writing Transforms Research Teams

arXiv:2605.27404v1 Announce Type: cross Abstract: The era of Big Science has long been defined by increasingly large and specialized research teams pushing the frontiers of knowledge. However, recent advances in artificial intelligence (AI), particularly large language models (LLMs), are beginning t

#artificial-intelligence #research #academic-writing Read on arxiv →

arxivMay 28bullish

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

arXiv:2605.27570v1 Announce Type: new Abstract: Parallel LLM test-time scaling techniques (e.g., best-of-$N$) require drawing $N>1$ sequences conditioned on the same input prompt. These methods boost accuracy while exploiting the computational efficiency of batching $N$ generations. However, each se

LA1 model #research #llm #parallel-processing Read on arxiv →

arxivMay 27

Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory

arXiv:2605.26252v1 Announce Type: new Abstract: Long-running AI agents need persistent memory. Memory supports learning across sessions, reduces repeated context injection, and enables auditing of past decisions. Current agent memory systems and database paradigms treat memory as storage. They local

#artificial-intelligence #databases #memory-management Read on arxiv →

arxivMay 26

Exploring Profiles of Cognitive Distortions Associated with Mental Health Disorders

arXiv:2605.24996v1 Announce Type: new Abstract: Cognitive distortions, distorted patterns of thinking, have been increasingly studied in computational mental health research. Although they are related to many, if not all, mental health disorders, most existing studies focus primarily on depression.

TR1 model #mental-health #research #nlp Read on arxiv →

arxivMay 22

How Many Different Outputs Can a Transformer Generate?

arXiv:2605.22223v1 Announce Type: new Abstract: We study how we can leverage only a handful of characteristics of a transformer's architecture to closely predict the number of different sequences it can output, both qualitatively and quantitatively. We provide an upper bound depending on the length

TR1 model #machine-learning #research #sequence-modeling Read on arxiv →

arxivMay 21

Refining and Reusing Annotation Guidelines for LLM Annotation

arXiv:2605.20809v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate remarkable performance on zero-shot annotation tasks, they often struggle with the specialized conventions of gold-standard benchmarks. We propose the systematic reuse and refinement of annotation guidelin

GPGEDE3 models #research #language models #benchmark Read on arxiv →

arxivMay 21

EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection

arXiv:2605.19630v1 Announce Type: new Abstract: With every advancement in generative AI models, forensics is under increasing pressure. The constant emergence of new generation techniques makes it impossible to collect data for each manipulation to train a deepfake detection model. Thus, generalizin

EMEM2 models #deepfakes #detection #research Read on arxiv →

arxivMay 15

Generative Bayesian Optimization: Generative Models as Acquisition Functions

arXiv:2510.25240v3 Announce Type: replace-cross Abstract: We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-con

#optimization #machine-learning #research Read on arxiv →

arxivMay 14

BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

arXiv:2605.12730v1 Announce Type: new Abstract: Existing AI systems for modeling human behavior operate at the level of individuals or detect events after they occur. As a result, they systematically fail to capture the collective dynamics that determine whether a group remains stable or transitions

#open-source #collaboration #community Read on arxiv →

arxivMay 11bullish

EviDep: Trustworthy Multimodal Depression Estimation via Disentangled Evidential Learning

arXiv:2604.16579v2 Announce Type: replace-cross Abstract: Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, however, produce uncalibrated point estimates withou

EV1 model #machine-learning #artificial-intelligence #mental-health Read on arxiv →

arxivMay 8

Prediction and Empowerment: A Theory of Agency through Bridge Interfaces

arXiv:2605.06346v1 Announce Type: new Abstract: We study agency under partial observability in deterministic physical or simulated worlds, where apparent randomness arises from uncertainty over initial conditions, fixed law bits, and unrolled exogenous noise. We model sensing and actuation as bridge

#artificial-intelligence #research #deterministic-models Read on arxiv →

arxivMay 8

Structural Instability of Feature Composition

arXiv:2605.05223v1 Announce Type: cross Abstract: Sparse Autoencoders (SAEs) have emerged as a powerful paradigm for disentangling feature superposition in transformer-based architectures, enabling precise control via activation steering. However, the theoretical foundations of compositional steerin

#machine-learning #artificial-intelligence #research Read on arxiv →

arxivMay 6

Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey

arXiv:2605.01392v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated significant potential across a wide range of software engineering tasks, including software design, an area traditionally regarded as highly dependent on human expertise and judgme

CH1 model #software-engineering #large-language-models #design Read on arxiv →

arxivMay 5bullish

Linking spatial biology and clinical histology via Haiku

arXiv:2605.00925v1 Announce Type: new Abstract: Integrating molecular, morphological, and clinical data is essential for basic and translational biomedical research, yet systematic frameworks for jointly modeling these modalities remain limited. Here we present Haiku, a tri-modal contrastive learnin

HA1 model #biomedical #research #machine-learning Read on arxiv →

arxivMay 5bearish

Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

arXiv:2605.01224v1 Announce Type: new Abstract: This paper argues that contemporary multilingual NLP has converged on a fragile and misleading paradigm of incidental multilingualism. Today's LLMs appear multilingual largely because they are trained on massive, uneven web corpora, not because multili

LL1 model #nlp #multilingualism #language-models Read on arxiv →

arxivMay 4

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

arXiv:2604.17423v2 Announce Type: replace Abstract: A unified framework for first-order optimization algorithms fornonconvex unconstrained optimization is proposed that uses adaptivelypreconditioned gradients and includes popular methods such as full anddiagonal AdaGrad, AdaNorm, as well as adpative

ADADSH4 models · +1 #optimization #machine-learning #research Read on arxiv →

arxivMay 1bearish

Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances

arXiv:2604.27927v1 Announce Type: new Abstract: We introduce a framework called LAPITHS (Language model Analysis through Paradigm grounded Interpretations of Theses about Human likenesS) and use it to show that several major claims advanced by models such as CENTAUR, proposed as an artificial Unifie

CELA2 models #cognitive #ai #research Read on arxiv →

arxivMay 1bullish

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

arXiv:2604.27467v1 Announce Type: cross Abstract: Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verificatio

#research #large-language-models #code-training Read on arxiv →

arxivMay 1bullish

QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems

arXiv:2604.24021v2 Announce Type: replace Abstract: We explore a central question in AI for mathematics: can AI systems produce original, nontrivial proofs for open research problems? Despite strong benchmark performance, producing genuinely novel proofs remains an outstanding challenge for LLMs. Th

LLQE2 models #proof-generation #open-source #mathematics Read on arxiv →

arxivApr 30

Structural Generalization on SLOG without Hand-Written Rules

arXiv:2604.26157v1 Announce Type: cross Abstract: Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Tran

#open-source #collaboration #community Read on arxiv →

arxivApr 29bullish

Fix Initial Codes and Iteratively Refine Textual Directions Toward Safe Multi-Turn Code Correction

arXiv:2604.23989v1 Announce Type: cross Abstract: Recent work on large language models (LLMs) has emphasized the importance of scaling inference compute. From this perspective, the state-of-the-art method Scattered Forest Search (SFS) has been proposed, employing Monte Carlo Tree Search with careful

SCIT2 models #machine-learning #code-generation #inference-performance Read on arxiv →

arxivApr 29

Can We Still Hear the Accent? Investigating the Resilience of Native Language Signals in the LLM Era

arXiv:2604.08568v2 Announce Type: replace-cross Abstract: The evolution of writing assistance tools from machine translation to large language models (LLMs) has changed how researchers write. This study investigates whether this shift is homogenizing research papers by analyzing native language iden

LA1 model #research #language #translation Read on arxiv →

arxivApr 27bullish

A general optimization solver based on OP-to-MaxSAT reduction

arXiv:2604.21961v1 Announce Type: cross Abstract: Optimization problems are fundamental in diverse fields, such as engineering, economics, and scientific computing. However, current algorithms are mostly designed for specific problem types and exhibit limited generality in solving multiple types of

#optimization #algorithm #research Read on arxiv →

arxivApr 24

Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

arXiv:2604.16565v2 Announce Type: replace-cross Abstract: While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geome

#machine-learning #artificial-intelligence #research Read on arxiv →

arxivApr 24

Formalising the Logit Shift Induced by LoRA: A Technical Note

arXiv:2604.20313v1 Announce Type: new Abstract: This technical note provides a first-order formalisation of the logit shift and fact-margin change induced by Low-Rank Adaptation (LoRA). Using a first-order Fr\'echet approximation around the base model trajectory, we show that the multi-layer LoRA ef

LO1 model #machine-learning #artificial-intelligence #research Read on arxiv →

arxivApr 23

Knowledge Capsules: Structured Nonparametric Memory Units for LLMs

arXiv:2604.20487v1 Announce Type: cross Abstract: Large language models (LLMs) encode knowledge in parametric weights, making it costly to update or extend without retraining. Retrieval-augmented generation (RAG) mitigates this limitation by appending retrieved text to the input, but operates purely

#research #language-models #knowledge-retrieval Read on arxiv →

arxivApr 23

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

arXiv:2604.16902v2 Announce Type: replace Abstract: Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this g

#research #language-models #multimodal Read on arxiv →

arxivApr 21

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

arXiv:2604.16042v2 Announce Type: cross Abstract: While Large Language Models (LLMs) have achieved strong performance across many NLP tasks, their opaque internal mechanisms hinder trustworthiness and safe deployment. Existing surveys in explainable AI largely focus on post-hoc explanation methods t

#explainability #nlp #research Read on arxiv →

arxivApr 21bullish

Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

arXiv:2507.16727v3 Announce Type: replace Abstract: Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based

#reliability #research #question-answering Read on arxiv →

arxivApr 21

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

arXiv:2603.24621v2 Announce Type: replace Abstract: We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents must explore, infer goals, build internal models of environment dynamics, and plan effective action

#benchmark #intelligence #research Read on arxiv →

thevergeApr 17

OpenAI’s former Sora boss is leaving

Last month, OpenAI gave up on its Sora video generation tool, and on Friday, the Sora team's leader, Bill Peebles, announced that he is leaving the company. OpenAI has been shifting its priorities as part of an effort to avoid "side quests," and Peebles' departure is just one of many recent changes

#departure #restructuring #research Read on theverge →

arxivApr 17

QU-NLP at ArchEHR-QA 2026: Two-Stage QLoRA Fine-Tuning of Qwen3-4B for Patient-Oriented Clinical Question Answering and Evidence Sentence Alignment

arXiv:2604.14175v1 Announce Type: new Abstract: We present a unified system addressing both Subtask 3 (answer generation) and Subtask 4 (evidence sentence alignment) of the ArchEHR-QA Shared Task. For Subtask 3, we apply two-stage Quantised Low-Rank Adaptation (QLoRA) to Qwen3-4B loaded in 4-bit NF4

QW1 model #research #natural language processing #question answering Read on arxiv →

arxivApr 14

Seven simple steps for log analysis in AI systems

arXiv:2604.09563v1 Announce Type: new Abstract: AI systems produce large volumes of logs as they interact with tools and users. Analysing these logs can help understand model capabilities, propensities, and behaviours, or assess whether an evaluation worked as intended. Researchers have started deve

#log-analysis #research #artificial-intelligence Read on arxiv →

arxivApr 9

Machine Unlearning in the Era of Quantum Machine Learning: An Empirical Study

arXiv:2512.19253v3 Announce Type: replace-cross Abstract: We present the first empirical study of machine unlearning (MU) in hybrid quantum-classical neural networks. While MU has been extensively explored in classical deep learning, its behavior within variational quantum circuits (VQCs) and quantu

#machine-learning #quantum-computing #neural-networks Read on arxiv →

arxivApr 7bullish

Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale

arXiv:2603.08406v2 Announce Type: replace-cross Abstract: Digital educational environments are expanding toward complex AI and human discourse, providing researchers with an abundance of data that offers deep insights into learning and instructional processes. However, traditional qualitative analys

LA1 model #education #research #qualitative-analysis Read on arxiv →

arxivApr 6

Discovery of Bimodal Drift Rate Structure in FRB 20240114A: Evidence for Dual Emission Regions

arXiv:2603.18109v2 Announce Type: replace-cross Abstract: We report the discovery of bimodal structure in the drift rate distribution of upward-drifting burst clusters from the hyperactive repeating fast radio burst FRB 20240114A. Using unsupervised machine learning (UMAP dimensionality reduction co

UMHDGA3 models #astrophysics #machinelearning #research Read on arxiv →

arxivApr 4

How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization

arXiv:2604.01938v1 Announce Type: new Abstract: The structure of all the permutations of a sequence can be represented as a permutohedron, a graph where vertices are permutations and two vertices are linked if a swap of adjacent elements in the permutation of one of the vertices produces the permuta

#language #optimization #research Read on arxiv →

arxivApr 3

A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation

arXiv:2604.00249v1 Announce Type: new Abstract: Single-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed

#research #open-source #collaboration Read on arxiv →

arxivApr 3bullish

QUEST: A robust attention formulation using query-modulated spherical attention

arXiv:2604.00199v1 Announce Type: cross Abstract: The Transformer model architecture has become one of the most widely used in deep learning and the attention mechanism is at its core. The standard attention formulation uses a softmax operation applied to a scaled dot product between query and key v

TR1 model #deep-learning #attention-mechanism #research Read on arxiv →

arxivApr 3

Best-Arm Identification with Noisy Actuation

arXiv:2604.02255v1 Announce Type: cross Abstract: In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabi

#information-theory #machine-learning #research Read on arxiv →

arxivApr 2bullish

Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

arXiv:2604.00260v1 Announce Type: new Abstract: Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling

LA1 model #optimization #machine-learning #research Read on arxiv →

arxivApr 2

Reconsidering Dependency Networks from an Information Geometry Perspective

arXiv:2604.01117v1 Announce Type: new Abstract: Dependency networks (Heckerman et al., 2000) provide a flexible framework for modeling complex systems with many variables by combining independently learned local conditional distributions through pseudo-Gibbs sampling. Despite their computational adv

#machine-learning #research #optimization Read on arxiv →