·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
SpaceX officially prices shares at $135 in the largest IPO ever4h◆Our new community investments in Virginia support local jobs and expand energy affordability.4h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift4h◆Amazon’s data centers used 2.5 billion gallons of water last year7h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others8h◆Pool’s new app turns your screenshots into something useful9h◆DoorDash’s new AI chatbot lets you order with prompts and photos10h◆Anthropic apologizes for invisible Claude Fable guardrails13h◆Google DeepMind is worried about what happens when millions of agents start to interact13h◆Deezer launches an AI music detector for other streaming services16h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing20h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning20h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!20h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation20h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions20h◆The Impossibility of Eliciting Latent Knowledge20h◆Mapping Scientific Literature with Large Language Models and Topic Modeling20h◆Grounding Computer Use Agents on Human Demonstrations20h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models20h◆LSTM based IoT Device Identification20h◆SpaceX officially prices shares at $135 in the largest IPO ever4h◆Our new community investments in Virginia support local jobs and expand energy affordability.4h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift4h◆Amazon’s data centers used 2.5 billion gallons of water last year7h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others8h◆Pool’s new app turns your screenshots into something useful9h◆DoorDash’s new AI chatbot lets you order with prompts and photos10h◆Anthropic apologizes for invisible Claude Fable guardrails13h◆Google DeepMind is worried about what happens when millions of agents start to interact13h◆Deezer launches an AI music detector for other streaming services16h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing20h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning20h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!20h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation20h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions20h◆The Impossibility of Eliciting Latent Knowledge20h◆Mapping Scientific Literature with Large Language Models and Topic Modeling20h◆Grounding Computer Use Agents on Human Demonstrations20h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models20h◆LSTM based IoT Device Identification20h◆
Tag

#large-language-models

31 articles tagged #large-language-models

arxiv6d ago

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories

arXiv:2606.04778v1 Announce Type: new Abstract: Safety-aligned Large Language Models (LLMs) remain vulnerable to interventions during inference that redirect generation toward harmful outputs. Recent work attributes this to shallow safety, where alignment concentrates in the first few output tokens.

#safety#large-language-models#vulnerabilityRead on arxiv →
arxivJun 2bullish

KACE: Knowledge-Adaptive Context Engineering for Mathematical Reasoning

arXiv:2606.00532v1 Announce Type: new Abstract: Context engineering can improve large language models without updating their weights, but mathematical reasoning exposes a key limitation: feedback accumulated in one growing prompt causes context bloat and limits the amount of learned guidance that ca

KA1 model#context-engineering#large-language-models#mathematical-reasoningRead on arxiv →
arxivMay 29

Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting

arXiv:2509.23571v3 Announce Type: replace-cross Abstract: As cyber threats continue to grow in scale and sophistication, blue team defenders increasingly require advanced tools to proactively detect and mitigate risks. Large Language Models (LLMs) offer promising capabilities for enhancing threat an

#cybersecurity#threat-hunting#benchmarkRead on arxiv →
arxivMay 28

Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure

arXiv:2605.01735v2 Announce Type: replace Abstract: As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information a

#unlearning#large-language-models#privacyRead on arxiv →
arxivMay 25bullish

Graph Alignment Topology as an Inductive Bias for Grounding Detection

arXiv:2605.22963v1 Announce Type: cross Abstract: Large Language Models (LLMs) are optimized to produce distributionally plausible continuations rather than to explicitly verify whether generated propositions are entailed by source documents. This inductive bias enables generalization, but it does n

GP1 model#large-language-models#factuality#graph-neural-networksRead on arxiv →
arxivMay 25bullish

Scaling-Aware Adapter for Structure-Grounded LLM Reasoning

arXiv:2602.02780v3 Announce Type: replace Abstract: Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query connectors. Such archite

CU1 model#large-language-models#multimodal#reasoningRead on arxiv →
arxivMay 22bullish

Retrospective Sparse Attention for Efficient Long-Context Generation

arXiv:2508.09001v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in long-context tasks such as reasoning, code generation, and multi-turn dialogue. However, inference over extended contexts is bottlenecked by the Key-Value (KV) cache, whose memory foot

#large-language-models#optimization#attention-mechanismsRead on arxiv →
arxivMay 22bullish

Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference

arXiv:2605.22162v1 Announce Type: cross Abstract: Stellar spectra encode key information on the physical properties and chemical compositions of stars. Accurate stellar parameter determination is essential for addressing major questions such as galaxy and stellar evolution. Large-scale spectroscopic

#astronomy#machine-learning#spectroscopyRead on arxiv →
arxivMay 14bullish

HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench

arXiv:2601.20255v2 Announce Type: replace-cross Abstract: SWE-bench has emerged as the premier benchmark for evaluating Large Language Models on complex software engineering tasks. While these capabilities are fundamentally acquired during the mid-training phase and subsequently elicited during Supe

#benchmark#software-engineering#large-language-modelsRead on arxiv →
arxivMay 11bullish

TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples

arXiv:2605.07935v1 Announce Type: new Abstract: We present TraceFix, a verification-first pipeline for Large Language Model (LLM) multi-agent coordination. An agent synthesizes a protocol topology as a structured intermediate representation (IR) from a task description, generates PlusCal coordinatio

#verification#multiagent#coordinationRead on arxiv →
arxivMay 11bullish

End-to-end PDDL Planning with Hardcoded and Dynamic Agents

arXiv:2512.09629v2 Announce Type: replace Abstract: We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem

OPGPGP5 models · +2#planning#natural-language-processing#large-language-modelsRead on arxiv →
arxivMay 8bullish

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv:2605.05225v1 Announce Type: cross Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-

MIMA2 models#multimodal#efficiency#inferenceRead on arxiv →
arxivMay 8bullish

LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning

arXiv:2601.20375v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) can be fine-tuned on domain-specific data to enhance their performance in specialized fields. However, such data often contains numerous low-quality samples, necessitating effective data processing (DP). In practi

LL1 model#automl#data-processing#large-language-modelsRead on arxiv →
arxivMay 6

Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey

arXiv:2605.01392v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated significant potential across a wide range of software engineering tasks, including software design, an area traditionally regarded as highly dependent on human expertise and judgme

CH1 model#software-engineering#large-language-models#designRead on arxiv →
arxivMay 5bullish

AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

arXiv:2605.00425v1 Announce Type: new Abstract: Reinforcement learning (RL) has significantly advanced the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. Yet effective training remains challenging, as sparse, outcome-only rewards make it diffic

#reinforcement-learning#large-language-models#exploration-exploitationRead on arxiv →
arxivMay 1bullish

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

arXiv:2604.27467v1 Announce Type: cross Abstract: Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verificatio

#research#large-language-models#code-trainingRead on arxiv →
arxivMay 1

When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

arXiv:2604.27082v1 Announce Type: new Abstract: We present a framework for migrating production Large Language Model (LLM) based systems when the underlying model reaches end-of-life or requires replacement. The key contribution is a Bayesian statistical approach that calibrates automated evaluation

#migration#evaluation#large-language-modelsRead on arxiv →
arxivMay 1

Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

arXiv:2602.10140v2 Announce Type: replace-cross Abstract: Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from standardized specifications in a way that supports repli

GPCL2 models#large-language-models#code-generation#agent-based-modelsRead on arxiv →
arxivApr 27bullish

An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

arXiv:2604.22199v1 Announce Type: cross Abstract: Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) interaction for uncovered

LL1 model#autonomous-robots#open-environments#large-language-modelsRead on arxiv →
arxivApr 24bullish

Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

arXiv:2506.12721v2 Announce Type: replace Abstract: Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To

#large-language-models#compute-optimization#bandit-learningRead on arxiv →
arxivApr 23

LayerTracer: A Joint Task-Particle and Vulnerable-Layer Analysis framework for Arbitrary Large Language Model Architectures

arXiv:2604.20556v1 Announce Type: cross Abstract: Currently, Large Language Models (LLMs) feature a diversified architectural landscape, including traditional Transformer, GateDeltaNet, and Mamba. However, the evolutionary laws of hierarchical representations, task knowledge formation positions, and

TRGAMA3 models#large-language-models#architecture#interpretabilityRead on arxiv →
arxivApr 21

Using large language models for embodied planning introduces systematic safety risks

arXiv:2604.18463v1 Announce Type: cross Abstract: Large language models are increasingly used as planners for robotic systems, yet how safely they plan remains an open question. To evaluate safe planning systematically, we introduce DESPITE, a benchmark of 12,279 tasks spanning physical and normativ

#safety#benchmark#roboticsRead on arxiv →
arxivApr 21

To LLM, or Not to LLM: How Designers and Developers Navigate LLMs as Tools or Teammates

arXiv:2604.15344v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly integrated into design and development workflows, yet decisions about their use are rarely binary or purely technical. We report findings from a constructivist grounded theory study based on interviews wi

#human-computer-interaction#large-language-models#sociotechnicalRead on arxiv →
arxivApr 21

Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting

arXiv:2604.15794v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved remarkable success, underpinning diverse AI applications. However, they often suffer from performance degradation due to factors such as catastrophic forgetting during Supervised Fine-Tuning (SFT), quantizat

#self-distillation#fine-tuning#large-language-modelsRead on arxiv →
arxivApr 20bullish

C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment

arXiv:2604.15675v1 Announce Type: new Abstract: Achieving cultural alignment in Large Language Models (LLMs) increasingly depends on synthetic data generation. For such synthesis, the most vital initial step is seed curation; however, current methods lack quantifiable standards for selecting these s

#cultural-alignment#large-language-models#data-synthesisRead on arxiv →
arxivApr 17

Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception

arXiv:2510.23853v3 Announce Type: replace Abstract: Large language model (LLM) agents are increasingly used to interact with and execute tasks in dynamic environments. However, a critical yet overlooked limitation of these agents is that they, by default, assume a stationary context, failing to acco

#temporal-awareness#large-language-models#human-alignedRead on arxiv →
arxivApr 17

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

arXiv:2604.13206v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly integrated into agentic workflows, their unpredictability stemming from numerical instability has emerged as a critical reliability issue. While recent studies have demonstrated the significant downstrea

#numerical-stability#large-language-models#transformer-architecturesRead on arxiv →
arxivApr 10bullish

SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization

arXiv:2604.07663v1 Announce Type: new Abstract: The AdamW optimizer, while standard for LLM pretraining, is a critical memory bottleneck, consuming optimizer states equivalent to twice the model's size. Although light-state optimizers like SinkGD attempt to address this issue, we identify the embedd

ME1 model#optimization#memory-efficiency#large-language-modelsRead on arxiv →
arxivApr 9bullish

An Automated Survey of Generative Artificial Intelligence: Large Language Models, Architectures, Protocols, and Applications

arXiv:2306.02781v3 Announce Type: replace-cross Abstract: Generative artificial intelligence, and large language models in particular, have emerged as one of the most transformative paradigms in modern computer science. This automated survey provides an accessible treatment of the field as of early

DEDEDE17 models · +14#large-language-models#generative-ai#machine-learningRead on arxiv →
arxivApr 8bullish

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

arXiv:2604.05808v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limit

ST1 model#reinforcement-learning#hierarchical-learning#large-language-modelsRead on arxiv →
arxivApr 7

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

arXiv:2604.02368v2 Announce Type: replace Abstract: As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks su

#benchmark#evaluation#expert-levelRead on arxiv →
HomeModelsNews