arxiv1d agobullish

What Matters When Building Universal Multilingual Named Entity Recognition Models?

arXiv:2601.06347v2 Announce Type: replace Abstract: Recent progress in universal multilingual named entity recognition (NER) has been driven by multilingual transformer models, task-specific architectures, custom loss functions, and large-scale training datasets. However, despite substantial prior w

OT1 model #multilingual #ner #transformer Read on arxiv →

arxivJun 27bullish

TOPS: First-Principles Visual Token Pruning via Constructing Token Optimal Preservation Sets for Efficient MLLM Inference

arXiv:2606.27161v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have achieved strong multimodal reasoning capabilities, but their efficiency is limited by the large number of visual tokens, which introduces substantial computational overhead. Visual token pruning offers a na

LL1 model #multimodal #pruning #efficiency Read on arxiv →

arxivJun 27bullish

Context Recycling for Long-Horizon LLM Inference

arXiv:2606.26105v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce ContextForge, a system for co

#language-models #conversational-ai #efficiency Read on arxiv →

arxivJun 18bullish

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

arXiv:2508.04086v3 Announce Type: replace Abstract: Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like depth-first search (DFS). This leads to inevitable annotation failures and low efficiency in data generation. We introduce

TO1 model #llm #dataset #open-source Read on arxiv →

arxivJun 17

Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication

arXiv:2606.17372v1 Announce Type: cross Abstract: Two recent studies (Jones et al. (2026); Zeng et al. (2026)) reach apparently contradictory conclusions about whether LVLMs can coordinate on efficient referring expressions. We control for task differences between the studies while directly comparin

LV1 model #language-models #communication #efficiency Read on arxiv →

arxivJun 15bullish

Exact Linear Attention

arXiv:2605.18848v4 Announce Type: replace-cross Abstract: This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by exploiting the exact decomposition property of kernel functions, thereby eliminating approximation erro

TRYO2 models #machine learning #transformer #attention mechanisms Read on arxiv →

arxivJun 10bullish

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

arXiv:2605.18271v2 Announce Type: replace-cross Abstract: With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-worl

EP1 model #on-device #privacy #efficiency Read on arxiv →

arxivJun 10bullish

Operator Fusion for LLM Inference on the Tensix Architecture

arXiv:2606.09879v1 Announce Type: new Abstract: This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in self-attention and in t

TRQWQW4 models · +1 #machine learning #optimization #parallelism Read on arxiv →

arxivJun 2bullish

AdaCodec: A Predictive Visual Code for Video MLLMs

arXiv:2606.02569v1 Announce Type: cross Abstract: Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens

ADQW2 models #video #multimodal #compression Read on arxiv →

arxivMay 29bullish

Moment Matching Q-Learning

arXiv:2605.29033v1 Announce Type: new Abstract: Score-based and flow-based generative models exhibit remarkable expressive capacity in capturing complex distributions, and have been extensively deployed in tasks ranging from image generation to reinforcement learning. Nevertheless, these models suff

#reinforcement-learning #generative-models #efficiency Read on arxiv →

arxivMay 29bullish

Rethinking Stepwise Model Routing: A Cost-Efficient Table Reasoning Perspective

arXiv:2605.29319v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) achieve strong performance on table reasoning tasks but incur substantial inference cost due to long reasoning traces. Stepwise model routing mitigates this issue by dynamically assigning reasoning steps to smaller or larg

#table-reasoning #efficiency #routing Read on arxiv →

arxivMay 26bullish

Kolmogorov-Arnold Fourier Networks

arXiv:2502.06018v3 Announce Type: replace-cross Abstract: Although Kolmogorov-Arnold-based interpretable networks (KANs) possess strong theoretical expressiveness, they suffer from severe parameter explosion and limited ability to capture high-frequency features in high-dimensional tasks. To address

KO1 model #machine-learning #neural-networks #spectral-reparameterization Read on arxiv →

arxivMay 21bullish

Dynamic Shapley Computation

arXiv:2605.20620v1 Announce Type: new Abstract: Shapley-based data valuation provides a principled way to quantify the contribution of training data, but its high computational cost makes it impractical in dynamic settings where tasks and training players evolve. Existing methods treat Shapley compu

#machine-learning #valuation #efficiency Read on arxiv →

arxivMay 19bullish

Trajectory-Aware Adaptive Inference in Object Detection Models

arXiv:2605.16397v1 Announce Type: cross Abstract: The increasing integration of sensors in autonomous maritime navigation has led to large-scale multimodal datasets, raising challenges in achieving efficient real-time perception. In such systems, object detection and trajectory perception of nearby

YO1 model #computer-vision #real-time #efficiency Read on arxiv →

arxivMay 16bullish

Beyond What to Select: A Plug-and-play Oscillatory Data-Volume Scheduling for Efficient Model Training

arXiv:2605.14773v1 Announce Type: cross Abstract: Data selection accelerates training by identifying representative training data while preserving model performance. However, existing methods mainly focus on designing sample-importance criteria, i.e., deciding what to select, while typically fixing

#optimization #machine-learning #efficiency Read on arxiv →

arxivMay 16bullish

Krause Synchronization Transformers

arXiv:2602.11534v3 Announce Type: replace-cross Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor

MEQWVI3 models #transformers #attention #efficiency Read on arxiv →

arxivMay 15bullish

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

arXiv:2605.14448v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have emerged as a powerful backbone for multimodal embeddings. Recent methods introduce chain-of-thought (CoT) reasoning into the embedding pipeline to improve retrieval quality, but remain costly in both mode

TH1 model #multimodal-embeddings #chain-of-thought #efficiency Read on arxiv →

arxivMay 13bullish

C2L-Net: A Data-Driven Model for State-of-Charge Estimation of Lithium-Ion Batteries During Discharge

arXiv:2605.08653v1 Announce Type: new Abstract: Accurate state-of-charge (SOC) estimation is critical for the safe and efficient operation of lithium-ion batteries in battery management systems (BMS). Although data-driven approaches can effectively capture nonlinear battery dynamics, many existing m

C21 model #battery-management #state-of-charge #efficiency Read on arxiv →

arxivMay 8bullish

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv:2605.05225v1 Announce Type: cross Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-

MIMA2 models #multimodal #efficiency #inference Read on arxiv →

arxivMay 5

Compute Optimal Tokenization

arXiv:2605.01188v1 Announce Type: new Abstract: Scaling laws enable the optimal selection of data amount and language model size, yet the impact of the data unit, the token, on this relationship remains underexplored. In this work, we systematically investigate how the information granularity of tok

BL1 model #tokenization #language-models #scaling-laws Read on arxiv →

arxivApr 30

A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification

arXiv:2604.26807v1 Announce Type: new Abstract: Despite being resource-intensive to train, 3D convolutional neural networks (CNNs) have been the standard approach to classify CT and MRI scans. Recent work suggests that deep multiple instance learning (MIL) may be a more efficient alternative for 3D

3D3DMI3 models #medical-imaging #neural-networks #efficiency Read on arxiv →

arxivApr 27bullish

HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference

arXiv:2604.22293v1 Announce Type: cross Abstract: Lookup-table (LUT) based neural networks can deliver ultra-low latency and excellent hardware efficiency on FPGAs by mapping arithmetic operations directly onto the logic primitives. However, state-of-the-art LUT-aware training (LAT) approaches remai

HG1 model #hardware #efficiency #neural-networks Read on arxiv →

arxivApr 24bullish

Energy-Based Open-Set Active Learning for Object Classification

arXiv:2604.20083v1 Announce Type: new Abstract: Active learning (AL) has emerged as a crucial methodology for minimizing labeling costs in deep learning by selecting the most valuable samples from a pool of unlabeled data for annotation. Traditional AL operates under a closed-set assumption, where a

ENEN2 models #active-learning #open-set #classification Read on arxiv →

arxivApr 23bullish

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

arXiv:2308.03303v2 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) is crucial for improving their performance on downstream tasks, but full-parameter fine-tuning (Full-FT) is computationally expensive and memory-intensive. Parameter-efficient fine-tuning (PEFT) methods, suc

LOLO2 models #fine-tuning #language-models #optimization Read on arxiv →

arxivApr 17

Chinese Language Is Not More Efficient Than English in Vibe Coding: A Preliminary Study on Token Cost and Problem-Solving Rate

arXiv:2604.14210v1 Announce Type: new Abstract: A claim has been circulating on social media and practitioner forums that Chinese prompts are more token-efficient than English for LLM coding tasks, potentially reducing costs by up to 40\%. This claim has influenced developers to consider switching t

MIGL2 models #language-models #efficiency #benchmark Read on arxiv →

arxivApr 13bullish

Boosting Brain-inspired Path Integration Efficiency via Learning-based Replication of Continuous Attractor Neurodynamics

arXiv:2511.17687v2 Announce Type: replace Abstract: The brain's Path Integration (PI) mechanism offers substantial guidance and inspiration for Brain-Inspired Navigation (BIN). However, the PI capability constructed by the Continuous Attractor Neural Networks (CANNs) in most existing BIN studies exh

COAR2 models #machine-learning #neural-networks #navigation Read on arxiv →

arxivApr 3bullish

Prompt-Guided Prefiltering for VLM Image Compression

arXiv:2604.00314v1 Announce Type: cross Abstract: The rapid progress of large Vision-Language Models (VLMs) has enabled a wide range of applications, such as image understanding and Visual Question Answering (VQA). Query images are often uploaded to the cloud, where VLMs are typically hosted, hence

#image-compression #vision-language #efficiency Read on arxiv →