DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling1h◆Multi-Turn Reasoning When Context Arrives in Pieces: Scalable Sharding and Memory-Augmented RL1h◆SkillChain: Closing the Loop on Skill Evolution for Image-Based E-Commerce AI Assistants1h◆No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions1h◆M\"OVE: A Holistic LLM Benchmark for the German Public Sector1h◆SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection1h◆Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization1h◆SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents1h◆Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation1h◆S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP1h◆Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data1h◆When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval1h◆EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments1h◆Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science1h◆Detecting Functional Memorization in Code Language Models1h◆Multi-Bitwidth Quantization for LLMs Using Additive Codebooks1h◆Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension1h◆Trait, Not State: The Durability of Reading Identity in Social Highlighting1h◆Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning1h◆Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents1h◆LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling1h◆Multi-Turn Reasoning When Context Arrives in Pieces: Scalable Sharding and Memory-Augmented RL1h◆SkillChain: Closing the Loop on Skill Evolution for Image-Based E-Commerce AI Assistants1h◆No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions1h◆M\"OVE: A Holistic LLM Benchmark for the German Public Sector1h◆SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection1h◆Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization1h◆SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents1h◆Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation1h◆S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP1h◆Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data1h◆When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval1h◆EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments1h◆Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science1h◆Detecting Functional Memorization in Code Language Models1h◆Multi-Bitwidth Quantization for LLMs Using Additive Codebooks1h◆Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension1h◆Trait, Not State: The Durability of Reading Identity in Social Highlighting1h◆Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning1h◆Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents1h◆
DataBubble·

Model Arena

0 OF 2 SLOTS FILLED
Select Models2 required · up to 4
Model 1
Model 2
Select at least 2 models to compare
No Comparison Running

Select at least 2 models above and hit Run Analysis to see a head-to-head breakdown of downloads, benchmarks, pricing, and trends.

GPT-4o vs Claude 3.5Llama 3 vs Gemma 2Mistral vs Qwen
HomeModelsNews