·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆
Tag

#transformers

3 articles tagged #transformers

arxivMay 16bullish

Krause Synchronization Transformers

arXiv:2602.11534v3 Announce Type: replace-cross Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor

MEQWVI3 models#transformers#attention#efficiencyRead on arxiv →
arxivApr 30bullish

Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

arXiv:2604.26762v1 Announce Type: cross Abstract: The Probabilistic Transformer (PT) establishes that the Transformer's self-attention plus its feed-forward block is mathematically equivalent to Mean-Field Variational Inference (MFVI) on a Conditional Random Field (CRF). Under this equivalence the T

PRSP2 models#time-series#probabilistic-models#transformersRead on arxiv →
arxivApr 21bullish

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

arXiv:2604.15356v1 Announce Type: cross Abstract: Recent work on KV cache quantization, culminating in TurboQuant, has approached the Shannon entropy limit for per-vector compression of transformer key-value caches. We observe that this limit applies to a strictly weaker problem than the one that ac

TU1 model#compression#quantization#transformersRead on arxiv →
HomeModelsNews