Ling-2.6-1T news

50 articles mentioning Ling-2.6-1T

arxiv19h ago

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

arXiv:2606.04535v1 Announce Type: cross Abstract: Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While straightforward fi

arxiv19h ago

Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models

arXiv:2606.04287v1 Announce Type: cross Abstract: Generating realistic and diverse graphs is a key problem in machine learning, with applications in molecular discovery, circuit design, cybersecurity, and beyond. However, current graph generative models remain limited by scalability and novelty. Dif

arxiv19h ago

Selective Coupling of Decoupled Informative Regions: Masked Attention Alignment for Data-Free Quantization of Vision Transformers

arXiv:2606.04373v1 Announce Type: cross Abstract: Data-Free Quantization (DFQ) addresses data security concerns by synthesizing samples, without accessing real data. It has garnered increasing attention in the context of Vision Transformers (ViTs), owing to the superiority of the self-attention mech

arxiv19h ago

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

arXiv:2606.04816v1 Announce Type: new Abstract: Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code. Yet for constraint-dense operations research (OR) problems, existing data-filtering and training pipelines largely rely on objective

arxiv19h ago

CLFEC: A New Task for Unified Linguistic and Factual Error Correction in paragraph-level Chinese Professional Writing

arXiv:2602.23845v2 Announce Type: replace Abstract: Chinese text correction has traditionally focused on spelling and grammar, while factual error correction is usually treated separately. However, in paragraph-level Chinese professional writing, linguistic (word/grammar/punctuation) and factual err

arxiv19h ago

Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation

arXiv:2605.25402v3 Announce Type: replace-cross Abstract: Self-supervised pre-training paradigm has gained increasing prominence for learning transferable representations in medical imaging, yet existing methods for ultrasound (US) images operate at the image or frame level, overlooking the anatomic

arxiv19h ago

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

arXiv:2606.04284v1 Announce Type: cross Abstract: Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward function, neglecting the dive

arxiv19h ago

Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling

arXiv:2604.25860v2 Announce Type: replace-cross Abstract: Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at

arxiv19h ago

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

arXiv:2606.05531v1 Announce Type: cross Abstract: Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagnose their true reasoning abilities and chart meaningful progress toward human-like multimodal intelligence. Most existing evaluations focus o

arxiv19h ago

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

arXiv:2601.22580v2 Announce Type: replace Abstract: The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures

arxiv19h ago

Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

arXiv:2602.09574v2 Announce Type: replace Abstract: Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment often imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-ag

arxiv19h ago

The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning

arXiv:2606.04280v1 Announce Type: cross Abstract: Contrastive learning has become a leading paradigm for self-supervised representation learning, yet the conditions under which it recovers meaningful latent geometry remain incompletely understood. We develop a measure-theoretic framework formalizing

#self-supervised #representation #learning

arxiv19h ago

DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling

arXiv:2606.04374v1 Announce Type: cross Abstract: Despite rapid progress of continuous embeddings for e-commerce search relevance, a long-standing open problem is the difficulty in capturing fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) have been widely adopted as a

arxiv19h ago

Scaling few-shot spoken word classification with generative meta-continual learning

arXiv:2605.13075v3 Announce Type: replace Abstract: Few-shot spoken word classification has largely been developed for applications where a small number of classes is considered, and so the potential of larger-scale few-shot spoken word classification remains untapped. This paper investigates the po

arxiv19h ago

Addressing Imbalance in Multi-Label Data via Label-Specific Distance-based Oversampling

arXiv:2606.05927v1 Announce Type: new Abstract: The complex imbalanced label distribution poses a crucial challenge to multi-label classification, as most classifiers are biased towards the majority class and high-frequent labels. Oversampling is an efficient and flexible solution that augments inst

arxiv19h ago

From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models

arXiv:2606.04381v1 Announce Type: cross Abstract: Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric} reasoning over space. Be

arxiv19h ago

LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling

arXiv:2606.04438v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) and looped architectures scale models along two orthogonal axes, namely parameter capacity and effective depth. However, mainstream looped architectures rely on dense backbones that couple parameter count with per-token FLOPs

arxiv19h ago

Deterministic Envelopes for Tamed SGLD: Decoupling Stochastic-Gradient Noise and Localizing Taming

arXiv:2606.05242v1 Announce Type: cross Abstract: Stochastic-gradient Langevin algorithms often use tamed denominators to stabilize non-globally Lipschitz drifts. This paper shows that when the denominator depends on the same stochastic-gradient realization as the numerator, the taming step changes

arxiv19h ago

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

arXiv:2606.04516v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from severe model collapse.

arxiv19h ago

DAS-PINNs for high-dimensional partial differential equations: extending deep adaptive sampling to spacetime domains

arXiv:2606.06314v1 Announce Type: cross Abstract: Time-dependent high-dimensional partial differential equations (PDEs) with spatially localised and dynamically evolving solutions pose a fundamental challenge for physics-informed neural networks (PINNs), as uniform collocation sampling becomes incre

arxiv19h ago

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

arXiv:2605.28829v2 Announce Type: replace-cross Abstract: Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strong

arxiv19h ago

Unraveling the Hidden Dynamical Structure in Recurrent Neural Policies

arXiv:2602.01196v2 Announce Type: replace Abstract: Recurrent neural policies are widely used in partially observable control and meta-RL tasks. Their abilities to maintain internal memory and adapt quickly to unseen scenarios have offered them unparalleled performance when compared to non-recurrent

arxiv19h ago

Toto 2.0: Time Series Forecasting Enters the Scaling Era

arXiv:2605.20119v2 Announce Type: replace Abstract: We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe.

arxiv19h ago

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

arXiv:2606.04177v1 Announce Type: cross Abstract: Interpretable linguistic features offer a promising approach for explaining why a given text appears machine-generated, particularly for non-expert users. However, existing findings on which features reliably indicate LLM-generated text remain fragme

arxiv19h ago

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton cat

arxiv19h ago

QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy

arXiv:2606.04620v1 Announce Type: cross Abstract: LLMs have become the state-of-the-art algorithms for solving NLP tasks. However, they typically come at huge computational and memory costs, thus making them difficult to deploy on embedded systems. Toward this, state-of-the-art methods typically emp

arxiv19h ago

Test-time reward-guided alignment of language models by importance sampling on pre-logit space

arXiv:2510.26219v3 Announce Type: replace-cross Abstract: Test-time alignment of large language models (LLMs) attracts attention because fine-tuning of LLMs requires high computational costs. In this paper, we propose a new test-time reward-guided alignment method called adaptive importance sampling

arxiv19h ago

Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

arXiv:2605.11632v2 Announce Type: replace Abstract: Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM beh

arxiv19h ago

DSL-Topic: Improving Topic Modeling by Distilling Soft Labelsfrom Language Models

arXiv:2602.17907v2 Announce Type: replace-cross Abstract: Traditional neural topic models are typically optimized by reconstructing the document's Bag-of-Words (BoW) representations, overlooking contextual information and struggling with data sparsity. In this work, we introduce a novel topic model

arxiv19h ago

Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis

arXiv:2603.09391v2 Announce Type: replace-cross Abstract: Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the

arxiv19h ago

Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics

arXiv:2606.05168v1 Announce Type: new Abstract: Training on synthetic data causes model collapse, but existing analyses treat this as single-chain degradation. In reality, the AI ecosystem involves cross-contamination: models ingest synthetic data from other models, produce new synthetic text, and c

arxiv19h ago

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

arXiv:2606.05846v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs

arxiv19h ago

English-to-Prakrit Machine Translation via Multilingual Transfer Learning

arXiv:2606.06038v1 Announce Type: new Abstract: We study English-to-Prakrit machine translation in a low-resource setting where the target language is unsupported by IndicTrans2. We adapt the multilingual model by mapping Prakrit to the Hindi language tag (hin_Deva) without modifying the tokenizer,

arxiv19h ago

Automatic Labelling of Speech Translation Errors

arXiv:2606.06047v1 Announce Type: new Abstract: Errors in speech translations reduce trustworthiness of Speech Translation (ST) systems and can have serious consequences. Yet currently there is no established methodology for evaluating confidence and quality estimation of speech translations. To ini

arxiv19h ago

CausalPOI: Spatio-Temporal Graph-Based Causal Modeling for Cold-Start POI Check-in Forecasting

arXiv:2606.05413v1 Announce Type: new Abstract: As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data-driven urban planning and commercial decision-making. While recent advancements in spatio-temporal graph

arxiv19h ago

Scalable Reinforcement Learning via Adaptive Batch Scaling

arXiv:2605.21557v2 Announce Type: replace-cross Abstract: Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the

arxiv19h ago

Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

arXiv:2606.04150v1 Announce Type: new Abstract: Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this

arxiv19h ago

Revisiting Prototype Rehearsal for Exemplar-Free Continual Learning: Manifold-Aware Boundary Sampling with Adaptive Class-Balanced Loss

arXiv:2606.05695v1 Announce Type: new Abstract: Exemplar-free class-incremental learning (EFCIL) aims to acquire new classes over time without storing raw data. Historically, prototype rehearsal, which samples around stored class prototypes and mixes them with current-task data, has been a popular s

arxiv19h ago

Generative Criticality in Large Language Model Temperature Scaling

arXiv:2606.06238v1 Announce Type: new Abstract: We propose a statistical-field framework for text generated by large language models (LLMs), treating token embeddings as continuous spin variables on a one-dimensional chain. Defining a susceptibility from the connected two-point correlator and an ord

arxiv19h ago

Spectral Scaling Laws of Muon

arXiv:2606.04058v1 Announce Type: cross Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization wi

arxiv19h ago

ProSPy: A Profiling-Driven SQL-Python Agentic Framework for Enterprise Text-to-SQL

arXiv:2606.05836v1 Announce Type: new Abstract: Large language models have substantially advanced Text-to-SQL systems, yet applying them to enterprise-scale databases remains challenging. Real-world databases often contain large and heterogeneous schemas, incomplete metadata, dialect-specific SQL sy

arxiv19h ago

Statistical Priors for Implicit Preferences: Decoupling Skill Selection as a Local Harness in Personal Agents

arXiv:2606.05828v1 Announce Type: cross Abstract: As Large Language Model (LLM) capabilities advance, locally deployed personal agents relying on API-based remote models and external skills have emerged as a novel paradigm. With the rapid expansion of available skills, enabling personal agents to le

arxiv19h ago

Supportive Token Revealing for Fast Diffusion Language Model Decoding

arXiv:2606.04236v1 Announce Type: cross Abstract: Discrete diffusion language models can generate text efficiently by updating multiple masked positions in parallel, but this parallelism introduces a quality-latency trade-off. Aggressive decoding may commit mutually dependent tokens too early, while

arxiv19h ago

AdaKoop: Efficient Modeling of Nonlinear Dynamics from Nonstationary Data Streams with Koopman Operator Regression

arXiv:2606.04930v1 Announce Type: cross Abstract: Real-time data analysis requires the ability to accurately and adaptively address nonlinear dynamics in a nonstationary data stream while preserving computational efficiency. However, nonlinear dynamics are so complex that capturing dynamically chang

arxiv19h ago

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

arXiv:2506.05233v2 Announce Type: replace-cross Abstract: Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work lineariz

arxiv19h ago

SHALA-LLM: Smartly Handling Ambiguous Labels in Aligning LLMs

arXiv:2606.05376v1 Announce Type: new Abstract: Many human-centered tasks, including natural language inference (NLI) and emotion recognition (ER), have multiple plausible interpretations, leading to label ambiguity and challenging disagreements across human annotators. As LLMs are increasingly depl

arxiv19h ago

Causal Modeling of Selection in Evolution

arXiv:2606.05689v1 Announce Type: new Abstract: Understanding potential selection in data is crucial for causal discovery; we argue that "selection" in common narratives takes two forms, which we term static and evolutionary selection, respectively. Static selection refers to a one-shot filtering pr

arxiv19h ago

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

arXiv:2606.05444v1 Announce Type: new Abstract: Coreference resolution is a core NLP task, having a broad range of downstream applications, e.g.~machine translation, question answering, document summarization, etc. While the task is well-studied in English, comparatively less attention is dedicated

arxiv19h ago

Scaling Laws for Behavioral Foundation Models over User Event Sequences

arXiv:2606.05257v1 Announce Type: new Abstract: Foundation models are increasingly trained on sequences of user actions in recommendation, payments, fraud, and commerce, but these models still lack the kind of compute calibration that scaling laws provide for language models. We study a common two-p

arxiv19h ago

Scaling Self-Evolving Agents via Parametric Memory

arXiv:2606.04536v1 Announce Type: new Abstract: Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but canno

Ling-2.6-1T news

50 articles mentioning Ling-2.6-1T

arxiv19h ago

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

arxiv19h ago

Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models

arxiv19h ago

Selective Coupling of Decoupled Informative Regions: Masked Attention Alignment for Data-Free Quantization of Vision Transformers

arxiv19h ago

QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy

arxiv19h ago

Revisiting Prototype Rehearsal for Exemplar-Free Continual Learning: Manifold-Aware Boundary Sampling with Adaptive Class-Balanced Loss

arxiv19h ago