arxiv19h ago
arXiv:2606.04535v1 Announce Type: cross Abstract: Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While straightforward fi
arxiv19h ago
arXiv:2606.04287v1 Announce Type: cross Abstract: Generating realistic and diverse graphs is a key problem in machine learning, with applications in molecular discovery, circuit design, cybersecurity, and beyond. However, current graph generative models remain limited by scalability and novelty. Dif
arxiv19h ago
arXiv:2606.04373v1 Announce Type: cross Abstract: Data-Free Quantization (DFQ) addresses data security concerns by synthesizing samples, without accessing real data. It has garnered increasing attention in the context of Vision Transformers (ViTs), owing to the superiority of the self-attention mech
arxiv19h ago
arXiv:2606.04816v1 Announce Type: new Abstract: Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code. Yet for constraint-dense operations research (OR) problems, existing data-filtering and training pipelines largely rely on objective
arxiv19h ago
arXiv:2602.23845v2 Announce Type: replace Abstract: Chinese text correction has traditionally focused on spelling and grammar, while factual error correction is usually treated separately. However, in paragraph-level Chinese professional writing, linguistic (word/grammar/punctuation) and factual err
arxiv19h ago
arXiv:2605.25402v3 Announce Type: replace-cross Abstract: Self-supervised pre-training paradigm has gained increasing prominence for learning transferable representations in medical imaging, yet existing methods for ultrasound (US) images operate at the image or frame level, overlooking the anatomic
arxiv19h ago
arXiv:2606.04284v1 Announce Type: cross Abstract: Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward function, neglecting the dive
arxiv19h ago
arXiv:2604.25860v2 Announce Type: replace-cross Abstract: Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at
arxiv19h ago
arXiv:2606.05531v1 Announce Type: cross Abstract: Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagnose their true reasoning abilities and chart meaningful progress toward human-like multimodal intelligence. Most existing evaluations focus o
arxiv19h ago
arXiv:2601.22580v2 Announce Type: replace Abstract: The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures
arxiv19h ago
arXiv:2602.09574v2 Announce Type: replace Abstract: Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment often imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-ag
arxiv19h ago
arXiv:2606.04280v1 Announce Type: cross Abstract: Contrastive learning has become a leading paradigm for self-supervised representation learning, yet the conditions under which it recovers meaningful latent geometry remain incompletely understood. We develop a measure-theoretic framework formalizing
arxiv19h ago
arXiv:2606.04374v1 Announce Type: cross Abstract: Despite rapid progress of continuous embeddings for e-commerce search relevance, a long-standing open problem is the difficulty in capturing fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) have been widely adopted as a
arxiv19h ago
arXiv:2605.13075v3 Announce Type: replace Abstract: Few-shot spoken word classification has largely been developed for applications where a small number of classes is considered, and so the potential of larger-scale few-shot spoken word classification remains untapped. This paper investigates the po
arxiv19h ago
arXiv:2606.05927v1 Announce Type: new Abstract: The complex imbalanced label distribution poses a crucial challenge to multi-label classification, as most classifiers are biased towards the majority class and high-frequent labels. Oversampling is an efficient and flexible solution that augments inst
arxiv19h ago
arXiv:2606.04381v1 Announce Type: cross Abstract: Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric} reasoning over space. Be
arxiv19h ago
arXiv:2606.04438v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) and looped architectures scale models along two orthogonal axes, namely parameter capacity and effective depth. However, mainstream looped architectures rely on dense backbones that couple parameter count with per-token FLOPs
arxiv19h ago
arXiv:2606.05242v1 Announce Type: cross Abstract: Stochastic-gradient Langevin algorithms often use tamed denominators to stabilize non-globally Lipschitz drifts. This paper shows that when the denominator depends on the same stochastic-gradient realization as the numerator, the taming step changes
arxiv19h ago
arXiv:2606.04516v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from severe model collapse.
arxiv19h ago
arXiv:2606.06314v1 Announce Type: cross Abstract: Time-dependent high-dimensional partial differential equations (PDEs) with spatially localised and dynamically evolving solutions pose a fundamental challenge for physics-informed neural networks (PINNs), as uniform collocation sampling becomes incre
arxiv19h ago
arXiv:2605.28829v2 Announce Type: replace-cross Abstract: Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strong
arxiv19h ago
arXiv:2602.01196v2 Announce Type: replace Abstract: Recurrent neural policies are widely used in partially observable control and meta-RL tasks. Their abilities to maintain internal memory and adapt quickly to unseen scenarios have offered them unparalleled performance when compared to non-recurrent
arxiv19h ago
arXiv:2605.20119v2 Announce Type: replace Abstract: We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe.
arxiv19h ago
arXiv:2606.04177v1 Announce Type: cross Abstract: Interpretable linguistic features offer a promising approach for explaining why a given text appears machine-generated, particularly for non-expert users. However, existing findings on which features reliably indicate LLM-generated text remain fragme
arxiv19h ago
arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton cat
arxiv19h ago
arXiv:2606.04620v1 Announce Type: cross Abstract: LLMs have become the state-of-the-art algorithms for solving NLP tasks. However, they typically come at huge computational and memory costs, thus making them difficult to deploy on embedded systems. Toward this, state-of-the-art methods typically emp
arxiv19h ago
arXiv:2510.26219v3 Announce Type: replace-cross Abstract: Test-time alignment of large language models (LLMs) attracts attention because fine-tuning of LLMs requires high computational costs. In this paper, we propose a new test-time reward-guided alignment method called adaptive importance sampling
arxiv19h ago
arXiv:2605.11632v2 Announce Type: replace Abstract: Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM beh
arxiv19h ago
arXiv:2602.17907v2 Announce Type: replace-cross Abstract: Traditional neural topic models are typically optimized by reconstructing the document's Bag-of-Words (BoW) representations, overlooking contextual information and struggling with data sparsity. In this work, we introduce a novel topic model
arxiv19h ago
arXiv:2603.09391v2 Announce Type: replace-cross Abstract: Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the
arxiv19h ago
arXiv:2606.05168v1 Announce Type: new Abstract: Training on synthetic data causes model collapse, but existing analyses treat this as single-chain degradation. In reality, the AI ecosystem involves cross-contamination: models ingest synthetic data from other models, produce new synthetic text, and c
arxiv19h ago
arXiv:2606.05846v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs
arxiv19h ago
arXiv:2606.06038v1 Announce Type: new Abstract: We study English-to-Prakrit machine translation in a low-resource setting where the target language is unsupported by IndicTrans2. We adapt the multilingual model by mapping Prakrit to the Hindi language tag (hin_Deva) without modifying the tokenizer,
arxiv19h ago
arXiv:2606.06047v1 Announce Type: new Abstract: Errors in speech translations reduce trustworthiness of Speech Translation (ST) systems and can have serious consequences. Yet currently there is no established methodology for evaluating confidence and quality estimation of speech translations. To ini
arxiv19h ago
arXiv:2606.05413v1 Announce Type: new Abstract: As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data-driven urban planning and commercial decision-making. While recent advancements in spatio-temporal graph
arxiv19h ago
arXiv:2605.21557v2 Announce Type: replace-cross Abstract: Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the
arxiv19h ago
arXiv:2606.04150v1 Announce Type: new Abstract: Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this
arxiv19h ago
arXiv:2606.05695v1 Announce Type: new Abstract: Exemplar-free class-incremental learning (EFCIL) aims to acquire new classes over time without storing raw data. Historically, prototype rehearsal, which samples around stored class prototypes and mixes them with current-task data, has been a popular s
arxiv19h ago
arXiv:2606.06238v1 Announce Type: new Abstract: We propose a statistical-field framework for text generated by large language models (LLMs), treating token embeddings as continuous spin variables on a one-dimensional chain. Defining a susceptibility from the connected two-point correlator and an ord
arxiv19h ago
arXiv:2606.04058v1 Announce Type: cross Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization wi
arxiv19h ago
arXiv:2606.05836v1 Announce Type: new Abstract: Large language models have substantially advanced Text-to-SQL systems, yet applying them to enterprise-scale databases remains challenging. Real-world databases often contain large and heterogeneous schemas, incomplete metadata, dialect-specific SQL sy
arxiv19h ago
arXiv:2606.05828v1 Announce Type: cross Abstract: As Large Language Model (LLM) capabilities advance, locally deployed personal agents relying on API-based remote models and external skills have emerged as a novel paradigm. With the rapid expansion of available skills, enabling personal agents to le
arxiv19h ago
arXiv:2606.04236v1 Announce Type: cross Abstract: Discrete diffusion language models can generate text efficiently by updating multiple masked positions in parallel, but this parallelism introduces a quality-latency trade-off. Aggressive decoding may commit mutually dependent tokens too early, while
arxiv19h ago
arXiv:2606.04930v1 Announce Type: cross Abstract: Real-time data analysis requires the ability to accurately and adaptively address nonlinear dynamics in a nonstationary data stream while preserving computational efficiency. However, nonlinear dynamics are so complex that capturing dynamically chang
arxiv19h ago
arXiv:2506.05233v2 Announce Type: replace-cross Abstract: Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work lineariz
arxiv19h ago
arXiv:2606.05376v1 Announce Type: new Abstract: Many human-centered tasks, including natural language inference (NLI) and emotion recognition (ER), have multiple plausible interpretations, leading to label ambiguity and challenging disagreements across human annotators. As LLMs are increasingly depl
arxiv19h ago
arXiv:2606.05689v1 Announce Type: new Abstract: Understanding potential selection in data is crucial for causal discovery; we argue that "selection" in common narratives takes two forms, which we term static and evolutionary selection, respectively. Static selection refers to a one-shot filtering pr
arxiv19h ago
arXiv:2606.05444v1 Announce Type: new Abstract: Coreference resolution is a core NLP task, having a broad range of downstream applications, e.g.~machine translation, question answering, document summarization, etc. While the task is well-studied in English, comparatively less attention is dedicated
arxiv19h ago
arXiv:2606.05257v1 Announce Type: new Abstract: Foundation models are increasingly trained on sequences of user actions in recommendation, payments, fraud, and commerce, but these models still lack the kind of compute calibration that scaling laws provide for language models. We study a common two-p
arxiv19h ago
arXiv:2606.04536v1 Announce Type: new Abstract: Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but canno