Databubble·

About

Mission

01 / 07

An interactive map of the AI model landscape.

Databubble is an AI model tracker and news aggregator. It exists for one reason: anyone trying to pick a model today has to pull numbers from five different leaderboards, three pricing pages, and a Twitter feed that moves faster than any of them. We bring that data into one place and show it on an interactive bubble chart, with a live news feed alongside that gives you the context behind why a model is moving.

Who it's for: developers comparing inference cost and quality before shipping a feature, researchers tracking which open-weights releases are picking up real adoption, and AI investors or operators trying to keep a coherent picture of a market that ships a frontier model every few weeks. The product is opinionated about visual density (Bloomberg Terminal aesthetic, monospace, dark mode) and unopinionated about the numbers themselves — we surface what the upstream sources publish and link back to them.

We don't train models, sell hosted inference, or take placement money from labs. The chart ranking is whatever the metric you picked says it is. That independence is the whole point of building this.

What we track

02 / 07

The product covers two surfaces: a model database and a news feed.

~1.2K
Models tracked: 15K+
News articles: 8
News sources: 7
Benchmarks

Models.Approximately 1,200 entries across LLMs, image generators, audio models, code-specific models, embedding models, and multimodal systems. We pull both open-weights releases from Hugging Face and proprietary models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Qwen, and others. Each model carries roughly thirty fields — downloads, likes, parameter count, context window, modality, license, plus benchmark scores and pricing where the upstream source publishes them.

Trends.We snapshot every model nightly and keep thirty days of history. That's what powers the day / week / month change deltas, the “trending” bubble size, and the per-model ranking history.

News.A signal layer on top of eight publishers' RSS feeds. Each article is matched against the model database so you can read every story that mentions Llama-3-70B, every story tagged cs.LG, or every story from arXiv in one place.

How we collect data

03 / 07

Every number on Databubble comes from a public API or RSS feed. The ingestion pipeline runs on a schedule and writes to a Supabase Postgres database. There is no manual editorial layer between source and chart.

Model data sources

01Hugging FaceTrending models, downloads, likes, metadataHourly
02Hugging Face Open LLM LeaderboardMMLU-Pro, BBH, GPQA, IFEval, Math, MUSRDaily
03ArtificialAnalysisIntelligence / coding / math composite scores, pricingDaily
04OpenRouter, TogetherHosted-API price-per-token (fallbacks)Daily
05LMSYS Chatbot ArenaArena ELO, vote countDaily
06SWE-benchVerified coding-task resolve rateWeekly
07Semantic ScholarPaper citation counts (via arxiv_id)Weekly
08GitHubRepository star countsTwice-weekly

News pipeline. Eight RSS feeds polled hourly: TechCrunch, Hugging Face, arXiv (cs.AI / cs.LG / cs.CL), The Verge, VentureBeat, OpenAI Blog, Google AI Blog, MIT Technology Review. For every article we store the publisher's verbatim RSS summary — we never paraphrase, rewrite, or re-host the source content. On top of that summary we run a small enrichment pass with Llama-3.3-70B (via Groq) that extracts sentiment, topic tags, mentioned models (matched against the canonical Hugging Face id), and mentioned companies. Those extracted fields power the listing page filters and the per-model news index.

When sources disagree. ArtificialAnalysis, OpenRouter, and Together frequently publish slightly different prices for the same model. We surface ArtificialAnalysis by default, fall back to OpenRouter, then Together, and label which source produced the number on the model detail page.

Free vs Pro

04 / 07

The chart, the news feed, and the model database are free and don't require an account. Most visitors never need anything else. Pro exists for people who want to build on top of the data.

Free

· Full bubble chart, all metrics
· Model detail pages and 30-day history
· News feed, search, filters, per-model news
· Compare page (head-to-head)
· Leaderboards and watchlist

Pro

· CSV / JSON data export
· Public REST API (models + news)
· Extended snapshot history
· Advanced compare and chart configurations

Pro is a small monthly subscription. We don't sell user data. We don't run lead-gen funnels. The only ads on the site are the ones served on article and listing pages, and they're what keeps the free tier sustainable while the project is small.

Why we built this

05 / 07

In late 2024 the visible AI model count crossed a threshold where keeping a mental model of the field stopped being possible. Hugging Face alone hosts hundreds of thousands of repositories. The set that actually matters — models with non-trivial adoption, a published benchmark score, or a known price — is much smaller, but it lives across a dozen different sites.

Every benchmark site shows different numbers because every benchmark measures something different. Arena ELO ranks models by human pairwise preference. MMLU-Pro grades knowledge breadth. SWE-bench grades real-world coding. ArtificialAnalysis composites several into a single intelligence score. None of those are wrong, but treating any single one as “the” ranking is misleading. A model that's strong at coding can be mid-tier on Arena. A model with a great Arena score can be slow and expensive at the API.

The thesis behind Databubble is that the right answer is not picking a winner — it's showing all the dimensions on one screen, letting you switch between them, and being explicit about which source produced each number. Transparency over editorial. Data over takes. The bubble chart is the surface that makes that comparison feel cheap to do; the news feed is the surface that gives you the why.

We're also building this in public. The codebase is closed for now, but the data sources, ingestion cadence, and product roadmap are documented on this page and on the founder's X account. If a number on Databubble looks wrong, the source is one click away on the model page — check it, and if it's our import that's wrong, send a note.

Dig deeper

06 / 07

Methodology

The full breakdown of how we compute the composite databubble_score— six weighted signals, normalisation, missing-data redistribution, a worked example, and the update cadence per source.

Read methodology →

Glossary

35 plain-English definitions of the terms you'll bump into on model pages and benchmarks — MMLU-Pro, MoE, context window, Arena ELO, SWE-bench, RAG, quantization, and the rest.

Browse glossary →

Who builds this

07 / 07

Marouane Gazouzi. Independent developer based in Morocco. Spent the previous decade building data-visualisation and developer tools; Databubble started as a weekend project to scratch the “which model do I actually use” itch and grew once it became clear other people had it too.

No VC funding, no team, no growth-hacking playbook. The product moves at the pace of one person who cares about getting it right. That has trade-offs — we cover fewer benchmark surfaces than a bigger team would, and feature requests can take a while — but it also means there's no editorial agenda or paid placement to push a particular vendor.

The fastest way to reach us is the contact page or X. Bug reports, data corrections, and feature ideas are all welcome.

Contact →@marouanegazouzi →@DataBubbleCo →gazouzi.com →