DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
SpaceX officially prices shares at $135 in the largest IPO ever4h◆Our new community investments in Virginia support local jobs and expand energy affordability.5h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift5h◆Amazon’s data centers used 2.5 billion gallons of water last year7h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others8h◆Pool’s new app turns your screenshots into something useful9h◆DoorDash’s new AI chatbot lets you order with prompts and photos10h◆Anthropic apologizes for invisible Claude Fable guardrails13h◆Google DeepMind is worried about what happens when millions of agents start to interact14h◆Deezer launches an AI music detector for other streaming services17h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing21h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning21h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!21h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation21h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions21h◆The Impossibility of Eliciting Latent Knowledge21h◆Mapping Scientific Literature with Large Language Models and Topic Modeling21h◆Grounding Computer Use Agents on Human Demonstrations21h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models21h◆LSTM based IoT Device Identification21h◆SpaceX officially prices shares at $135 in the largest IPO ever4h◆Our new community investments in Virginia support local jobs and expand energy affordability.5h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift5h◆Amazon’s data centers used 2.5 billion gallons of water last year7h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others8h◆Pool’s new app turns your screenshots into something useful9h◆DoorDash’s new AI chatbot lets you order with prompts and photos10h◆Anthropic apologizes for invisible Claude Fable guardrails13h◆Google DeepMind is worried about what happens when millions of agents start to interact14h◆Deezer launches an AI music detector for other streaming services17h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing21h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning21h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!21h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation21h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions21h◆The Impossibility of Eliciting Latent Knowledge21h◆Mapping Scientific Literature with Large Language Models and Topic Modeling21h◆Grounding Computer Use Agents on Human Demonstrations21h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models21h◆LSTM based IoT Device Identification21h◆
Databubble·

About

Mission
01 / 07

An interactive map of the AI model landscape.

Databubble is an AI model tracker and news aggregator. It exists for one reason: anyone trying to pick a model today has to pull numbers from five different leaderboards, three pricing pages, and a Twitter feed that moves faster than any of them. We bring that data into one place and show it on an interactive bubble chart, with a live news feed alongside that gives you the context behind why a model is moving.

Who it's for: developers comparing inference cost and quality before shipping a feature, researchers tracking which open-weights releases are picking up real adoption, and AI investors or operators trying to keep a coherent picture of a market that ships a frontier model every few weeks. The product is opinionated about visual density (Bloomberg Terminal aesthetic, monospace, dark mode) and unopinionated about the numbers themselves — we surface what the upstream sources publish and link back to them.

We don't train models, sell hosted inference, or take placement money from labs. The chart ranking is whatever the metric you picked says it is. That independence is the whole point of building this.

What we track
02 / 07

The product covers two surfaces: a model database and a news feed.

~1.2K
Models tracked
15K+
News articles
8
News sources
7
Benchmarks

Models.Approximately 1,200 entries across LLMs, image generators, audio models, code-specific models, embedding models, and multimodal systems. We pull both open-weights releases from Hugging Face and proprietary models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Qwen, and others. Each model carries roughly thirty fields — downloads, likes, parameter count, context window, modality, license, plus benchmark scores and pricing where the upstream source publishes them.

Trends.We snapshot every model nightly and keep thirty days of history. That's what powers the day / week / month change deltas, the “trending” bubble size, and the per-model ranking history.

News.A signal layer on top of eight publishers' RSS feeds. Each article is matched against the model database so you can read every story that mentions Llama-3-70B, every story tagged cs.LG, or every story from arXiv in one place.

How we collect data
03 / 07

Every number on Databubble comes from a public API or RSS feed. The ingestion pipeline runs on a schedule and writes to a Supabase Postgres database. There is no manual editorial layer between source and chart.

Model data sources
  • 01Hugging FaceTrending models, downloads, likes, metadataHourly
  • 02Hugging Face Open LLM LeaderboardMMLU-Pro, BBH, GPQA, IFEval, Math, MUSRDaily
  • 03ArtificialAnalysisIntelligence / coding / math composite scores, pricingDaily
  • 04OpenRouter, TogetherHosted-API price-per-token (fallbacks)Daily
  • 05LMSYS Chatbot ArenaArena ELO, vote countDaily
  • 06SWE-benchVerified coding-task resolve rateWeekly
  • 07Semantic ScholarPaper citation counts (via arxiv_id)Weekly
  • 08GitHubRepository star countsTwice-weekly

News pipeline. Eight RSS feeds polled hourly: TechCrunch, Hugging Face, arXiv (cs.AI / cs.LG / cs.CL), The Verge, VentureBeat, OpenAI Blog, Google AI Blog, MIT Technology Review. For every article we store the publisher's verbatim RSS summary — we never paraphrase, rewrite, or re-host the source content. On top of that summary we run a small enrichment pass with Llama-3.3-70B (via Groq) that extracts sentiment, topic tags, mentioned models (matched against the canonical Hugging Face id), and mentioned companies. Those extracted fields power the listing page filters and the per-model news index.

When sources disagree. ArtificialAnalysis, OpenRouter, and Together frequently publish slightly different prices for the same model. We surface ArtificialAnalysis by default, fall back to OpenRouter, then Together, and label which source produced the number on the model detail page.

Free vs Pro
04 / 07

The chart, the news feed, and the model database are free and don't require an account. Most visitors never need anything else. Pro exists for people who want to build on top of the data.

Free
  • · Full bubble chart, all metrics
  • · Model detail pages and 30-day history
  • · News feed, search, filters, per-model news
  • · Compare page (head-to-head)
  • · Leaderboards and watchlist
Pro
  • · CSV / JSON data export
  • · Public REST API (models + news)
  • · Extended snapshot history
  • · Advanced compare and chart configurations

Pro is a small monthly subscription. We don't sell user data. We don't run lead-gen funnels. The only ads on the site are the ones served on article and listing pages, and they're what keeps the free tier sustainable while the project is small.

Why we built this
05 / 07

In late 2024 the visible AI model count crossed a threshold where keeping a mental model of the field stopped being possible. Hugging Face alone hosts hundreds of thousands of repositories. The set that actually matters — models with non-trivial adoption, a published benchmark score, or a known price — is much smaller, but it lives across a dozen different sites.

Every benchmark site shows different numbers because every benchmark measures something different. Arena ELO ranks models by human pairwise preference. MMLU-Pro grades knowledge breadth. SWE-bench grades real-world coding. ArtificialAnalysis composites several into a single intelligence score. None of those are wrong, but treating any single one as “the” ranking is misleading. A model that's strong at coding can be mid-tier on Arena. A model with a great Arena score can be slow and expensive at the API.

The thesis behind Databubble is that the right answer is not picking a winner — it's showing all the dimensions on one screen, letting you switch between them, and being explicit about which source produced each number. Transparency over editorial. Data over takes. The bubble chart is the surface that makes that comparison feel cheap to do; the news feed is the surface that gives you the why.

We're also building this in public. The codebase is closed for now, but the data sources, ingestion cadence, and product roadmap are documented on this page and on the founder's X account. If a number on Databubble looks wrong, the source is one click away on the model page — check it, and if it's our import that's wrong, send a note.

Dig deeper
06 / 07
Methodology

The full breakdown of how we compute the composite databubble_score— six weighted signals, normalisation, missing-data redistribution, a worked example, and the update cadence per source.

Read methodology →
Glossary

35 plain-English definitions of the terms you'll bump into on model pages and benchmarks — MMLU-Pro, MoE, context window, Arena ELO, SWE-bench, RAG, quantization, and the rest.

Browse glossary →
Who builds this
07 / 07

Marouane Gazouzi. Independent developer based in Morocco. Spent the previous decade building data-visualisation and developer tools; Databubble started as a weekend project to scratch the “which model do I actually use” itch and grew once it became clear other people had it too.

No VC funding, no team, no growth-hacking playbook. The product moves at the pace of one person who cares about getting it right. That has trade-offs — we cover fewer benchmark surfaces than a bigger team would, and feature requests can take a while — but it also means there's no editorial agenda or paid placement to push a particular vendor.

The fastest way to reach us is the contact page or X. Bug reports, data corrections, and feature ideas are all welcome.

Contact →@marouanegazouzi →@DataBubbleCo →gazouzi.com →
HomeModelsNews