·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆
Tag

#language models

12 articles tagged #language models

arxivMay 28bullish

Regression Language Models for Code

arXiv:2509.26476v2 Announce Type: replace-cross Abstract: We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering,

RET52 models#code-to-metric regression#language models#performance predictionRead on arxiv →
arxivMay 22bullish

DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline

arXiv:2512.14896v2 Announce Type: replace-cross Abstract: In our study, we evaluated large language model (LLM) performance on pharmacy licensure-style question-answering tasks and developed an external knowledge integration method to improve accuracy. We benchmarked ten LLMs with varying parameter

GPO3GE4 models · +1#pharmacy#language models#question answeringRead on arxiv →
arxivMay 21

Refining and Reusing Annotation Guidelines for LLM Annotation

arXiv:2605.20809v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate remarkable performance on zero-shot annotation tasks, they often struggle with the specialized conventions of gold-standard benchmarks. We propose the systematic reuse and refinement of annotation guidelin

GPGEDE3 models#research#language models#benchmarkRead on arxiv →
arxivMay 21

Features have life history. And we should care

arXiv:2605.18789v1 Announce Type: cross Abstract: Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find evidence of a persistent representational backbone, which we identify in Pythia-160M

PYPY2 models#language models#training dynamics#neural networksRead on arxiv →
arxivMay 8

Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity

arXiv:2605.06327v1 Announce Type: cross Abstract: Safety benchmarks are routinely treated as evidence about how a language model will behave once deployed, but this inference is fragile if behavior depends on whether a prompt looks like an evaluation. We define evaluation-context divergence as an ob

OLOLMI7 models · +4#safety#benchmark#evaluationRead on arxiv →
arxivApr 24

Convergent Evolution: How Different Language Models Learn Similar Number Representations

arXiv:2604.20817v1 Announce Type: cross Abstract: Language models trained on natural text learn to represent numbers using periodic features with dominant periods at $T=2, 5, 10$. In this paper, we identify a two-tiered hierarchy of these features: while Transformers, Linear RNNs, LSTMs, and classic

TRLILS4 models · +1#language models#feature learning#convergent evolutionRead on arxiv →
arxivApr 24

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

arXiv:2604.21836v1 Announce Type: cross Abstract: Neural networks exhibit a remarkable degree of representational convergence across diverse architectures, training objectives, and even data modalities. This convergence is predictive of alignment with brain representation. A recent hypothesis sugges

DI1 model#neural networks#representational convergence#cross-modal alignmentRead on arxiv →
arxivApr 18

Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters

arXiv:2604.14174v1 Announce Type: cross Abstract: Alignment-tuned language models frequently suppress factual log-probabilities on politically sensitive topics despite retaining the knowledge in their hidden representations. We show that a 786K-parameter (approximately 0.02% of the base model) post-

QWIN2 models#language models#adapter#censorshipRead on arxiv →
arxivApr 10bullish

Dynamic Context Evolution for Scalable Synthetic Data Generation

arXiv:2604.07147v1 Announce Type: cross Abstract: Large language models produce repetitive output when prompted independently across many batches, a phenomenon we term cross-batch mode collapse: the progressive loss of output diversity when a language model is prompted repeatedly without access to i

GPCLAL3 models#language models#mode collapse#deduplicationRead on arxiv →
arxivApr 6bearish

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

arXiv:2604.02947v1 Announce Type: new Abstract: Computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments. Unlike chat systems, they maintain state across interactions and translate intermediate outputs into concrete actions. T

CLOPIF7 models · +4#safety#benchmark#autonomous agentsRead on arxiv →
arxivApr 6

Learning the Signature of Memorization in Autoregressive Language Models

arXiv:2604.03199v1 Announce Type: cross Abstract: All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack,

MARWRE3 models#membership inference#language models#transfer learningRead on arxiv →
arxivApr 1

The Last Fingerprint: How Markdown Training Shapes LLM Prose

arXiv:2603.27006v1 Announce Type: cross Abstract: Large language models produce em dashes at varying rates, and the observation that some models "overuse" them has become one of the most widely discussed markers of AI-generated text. Yet no mechanistic account of this pattern exists, and the paralle

MEOP2 models#language models#training data#fine-tuningRead on arxiv →
HomeModelsNews