·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆
Tag

#open-source

20 articles tagged #open-source

arxiv22h ago

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

arXiv:2606.11639v1 Announce Type: new Abstract: The popularization of automatic speech recognition (ASR) systems has increased exploration of the demographic biases related to race, age, gender, and accent, often formed from imbalanced training data. Most of these studies focused on standard graphem

WHZI2 models#speech recognition#bias#multilingualRead on arxiv →
arxiv6d agobullish

SciDER: Scientific Data-centric End-to-end Researcher

arXiv:2603.01421v3 Announce Type: replace Abstract: While large language models accelerate scientific discovery, existing agents face severe limitations in adaptability, domain generalization, and multimodal scalability, often struggling to autonomously process raw, domain-specific experimental data

SCOP2 models#scientific discovery#multimodal scalability#open-sourceRead on arxiv →
arxivJun 2

OmniEEG-Bench: A Standardized Evaluation Benchmark for EEG Foundation Models

arXiv:2606.00815v1 Announce Type: new Abstract: Electroencephalography (EEG) supports a variety of brain-computer interface (BCI) tasks ranging from brain-state monitoring to human-LLM interactions. EEG foundation models are emerging, but evaluation remains fragmented due to heterogeneous datasets a

#benchmark#machine-learning#neuroscienceRead on arxiv →
arxivMay 29bullish

Formalizing Mathematics at Scale

arXiv:2605.29955v1 Announce Type: new Abstract: We present AutoformBot, a multi-agent system for building an Autoformalized Textbook Library At Scale (Atlas) in Lean 4. AutoformBot orchestrates thousands of LLM agents, equipped with formal verification tools, dependency-aware task scheduling, and co

AU1 model#autoformalization#mathematics#verificationRead on arxiv →
arxivMay 26

AI-Driven Adaptive Adversaries and the Erosion of Cryptographic Trust in Public Key Systems

arXiv:2605.24542v1 Announce Type: cross Abstract: This paper examines the erosion of Public Key Cryptography (PKC) security under adaptive adversarial optimisation driven by artificial intelligence. The problem addressed is the growing mismatch between algorithm-centric cryptographic security models

#open-source#collaboration#communityRead on arxiv →
arxivMay 19bullish

SAME: A Semantically-Aligned Music Autoencoder

arXiv:2605.18613v1 Announce Type: cross Abstract: Latent representations are at the heart of the majority of modern generative models. In the audio domain they are typically produced by a neural-audio-codec autoencoder. In this work we introduce SAME (Semantically-Aligned Music autoEncoder), an auto

SASASA3 models#audio#generative#autoencoderRead on arxiv →
arxivMay 15bullish

A Large Language Model Based Pipeline for Review of Systems Entity Recognition from Clinical Notes

arXiv:2506.11067v3 Announce Type: replace Abstract: Objective: Develop a cost-effective, large language model (LLM)-based pipeline for automatically extracting Review of Systems (ROS) entities from clinical notes. Materials and Methods: The pipeline extracts ROS section from the clinical note using

MEGEMI4 models · +1#healthcare#language-models#open-sourceRead on arxiv →
arxivMay 14

BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

arXiv:2605.12730v1 Announce Type: new Abstract: Existing AI systems for modeling human behavior operate at the level of individuals or detect events after they occur. As a result, they systematically fail to capture the collective dynamics that determine whether a group remains stable or transitions

#open-source#collaboration#communityRead on arxiv →
arxivMay 14bullish

Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

arXiv:2604.10720v2 Announce Type: replace Abstract: Artificial students -- models that simulate how learners act and respond within educational systems -- are a promising tool for evaluating tutoring strategies and feedback mechanisms at scale. However, most existing approaches rely on prompting lar

QW1 model#open-source#education#programmingRead on arxiv →
arxivMay 1bullish

QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems

arXiv:2604.24021v2 Announce Type: replace Abstract: We explore a central question in AI for mathematics: can AI systems produce original, nontrivial proofs for open research problems? Despite strong benchmark performance, producing genuinely novel proofs remains an outstanding challenge for LLMs. Th

LLQE2 models#proof-generation#open-source#mathematicsRead on arxiv →
arxivApr 30

Structural Generalization on SLOG without Hand-Written Rules

arXiv:2604.26157v1 Announce Type: cross Abstract: Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Tran

#open-source#collaboration#communityRead on arxiv →
arxivApr 27bullish

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

arXiv:2604.22260v1 Announce Type: cross Abstract: Urban transportation systems face growing safety challenges that require scalable intelligence for emerging smart mobility infrastructures. While recent advances in foundation models and large-scale multimodal datasets have strengthened perception an

UN1 model#open-source#dataset#computer-visionRead on arxiv →
arxivApr 22

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

arXiv:2604.19354v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agent

LL1 model#cybersecurity#benchmark#open-sourceRead on arxiv →
arxivApr 13

A novel hybrid approach for positive-valued DAG learning

arXiv:2604.08935v1 Announce Type: cross Abstract: Causal discovery from observational data remains a fundamental challenge in machine learning and statistics, particularly when variables represent inherently positive quantities such as gene expression levels, asset prices, company revenues, or popul

#open-source#collaboration#communityRead on arxiv →
arxivApr 4bullish

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling

arXiv:2407.15828v2 Announce Type: replace Abstract: Spoken dialogue is essential for human-AI interactions, providing expressive capabilities beyond text. Developing effective spoken dialogue systems (SDSs) requires large-scale, high-quality, and diverse spoken dialogue corpora. However, existing da

#open-source#dataset#speechRead on arxiv →
arxivApr 4bullish

Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging

arXiv:2604.01538v1 Announce Type: new Abstract: Large language models have been adopted in the medical domain for clinical documentation to reduce clinician burden. However, studies have reported that LLMs often "forget" a significant amount of instruction-following ability when fine-tuned using a t

GAME2 models#open-source#clinical#domain-adaptationRead on arxiv →
arxivApr 3

A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation

arXiv:2604.00249v1 Announce Type: new Abstract: Single-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed

#research#open-source#collaborationRead on arxiv →
arxivApr 3

(PAC-)Learning state machines from data streams: A generic strategy and an improved heuristic (Extended version)

arXiv:2604.02244v1 Announce Type: cross Abstract: This is an extended version of our publication Learning state machines from data streams: A generic strategy and an improved heuristic, International Conference on Grammatical Inference (ICGI) 2023, Rabat, Morocco. It has been extended with a formal

#state-machines#machine-learning#open-sourceRead on arxiv →
arxivApr 3bullish

Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents

arXiv:2604.00137v1 Announce Type: new Abstract: Tool-integrated LLMs can retrieve, compute, and take real-world actions via external tools, but reliability remains a key bottleneck. We argue that failures stem from both tool-use accuracy (how well an agent invokes a tool) and intrinsic tool accuracy

#reliability#benchmark#open-sourceRead on arxiv →
arxivApr 1bullish

A Multi-Agent Rhizomatic Pipeline for Non-Linear Literature Analysis

arXiv:2603.28336v2 Announce Type: replace Abstract: Systematic literature reviews in the social sciences overwhelmingly follow arborescent logics -- hierarchical keyword filtering, linear screening, and taxonomic classification -- that suppress the lateral connections, ruptures, and emergent pattern

ALRHLA3 models#open-source#literature-review#complexityRead on arxiv →
HomeModelsNews