·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
SpaceX officially prices shares at $135 in the largest IPO ever5h◆Our new community investments in Virginia support local jobs and expand energy affordability.5h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift5h◆Amazon’s data centers used 2.5 billion gallons of water last year8h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others9h◆Pool’s new app turns your screenshots into something useful10h◆DoorDash’s new AI chatbot lets you order with prompts and photos11h◆Anthropic apologizes for invisible Claude Fable guardrails14h◆Google DeepMind is worried about what happens when millions of agents start to interact14h◆Deezer launches an AI music detector for other streaming services17h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing21h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning21h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!21h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation21h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions21h◆The Impossibility of Eliciting Latent Knowledge21h◆Mapping Scientific Literature with Large Language Models and Topic Modeling21h◆Grounding Computer Use Agents on Human Demonstrations21h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models21h◆LSTM based IoT Device Identification21h◆SpaceX officially prices shares at $135 in the largest IPO ever5h◆Our new community investments in Virginia support local jobs and expand energy affordability.5h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift5h◆Amazon’s data centers used 2.5 billion gallons of water last year8h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others9h◆Pool’s new app turns your screenshots into something useful10h◆DoorDash’s new AI chatbot lets you order with prompts and photos11h◆Anthropic apologizes for invisible Claude Fable guardrails14h◆Google DeepMind is worried about what happens when millions of agents start to interact14h◆Deezer launches an AI music detector for other streaming services17h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing21h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning21h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!21h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation21h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions21h◆The Impossibility of Eliciting Latent Knowledge21h◆Mapping Scientific Literature with Large Language Models and Topic Modeling21h◆Grounding Computer Use Agents on Human Demonstrations21h◆Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models21h◆LSTM based IoT Device Identification21h◆
Tag

#multimodal

20 articles tagged #multimodal

arxiv5d agobullish

Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning

arXiv:2601.21700v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) increasingly support culturally sensitive decision making, yet often exhibit misalignment due to skewed pretraining data and the absence of structured value representations. Existing methods can steer outputs, but

#ontology#multimodal#cultural-sensitivityRead on arxiv →
arxiv5d agobearish

The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?

arXiv:2504.10020v4 Announce Type: replace-cross Abstract: Contrastive decoding strategies are widely used to reduce object hallucinations in multimodal large language models (MLLMs). These methods work by constructing contrastive samples to induce hallucinations and then suppressing them in the outp

#multimodal#hallucinations#language-modelsRead on arxiv →
arxivJun 3bullish

Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals

arXiv:2606.02679v1 Announce Type: new Abstract: Multimodal systems often benefit from combining information across language, sound, and visual streams, but this benefit is not guaranteed. A modality that is useful for one input may become distracting for another, and local feature responses within t

#multimodal#fusion#calibrationRead on arxiv →
arxivJun 2bullish

AdaCodec: A Predictive Visual Code for Video MLLMs

arXiv:2606.02569v1 Announce Type: cross Abstract: Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens

ADQW2 models#video#multimodal#compressionRead on arxiv →
arxivMay 25bullish

Scaling-Aware Adapter for Structure-Grounded LLM Reasoning

arXiv:2602.02780v3 Announce Type: replace Abstract: Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query connectors. Such archite

CU1 model#large-language-models#multimodal#reasoningRead on arxiv →
arxivMay 22bullish

DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning

arXiv:2509.20912v4 Announce Type: replace Abstract: Recent advances in multimodal language models (MLLMs) have made thinking with images a dominant paradigm for multimodal reasoning. However, existing methods still fail to ensure evidence-answer consistency, where correct answers must be supported b

#multimodal#reasoning#counterfactualRead on arxiv →
arxivMay 16

MediaClaw: Multimodal Intelligent-Agent Platform Technical Report

arXiv:2605.14771v1 Announce Type: new Abstract: MediaClaw is a multimodal agent platform built on the OpenClaw ecosystem. Its core design follows a three-layer architecture of unified abstraction, pluginized extension, and workflow orchestration. The system is intended to address practical deploymen

#multimodal#architecture#artificial-intelligenceRead on arxiv →
arxivMay 16bullish

MHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMs

arXiv:2605.14966v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) have achieved remarkable performance across diverse multimodal tasks, yet they continue to suffer from hallucinations, generating content that is inconsistent with the visual input. Prior work DHCP (Detecting Hall

MHDH2 models#hallucination#mitigation#multimodalRead on arxiv →
arxivMay 12bullish

Towards Customized Multimodal Role-Play

arXiv:2605.08129v1 Announce Type: new Abstract: Unified multimodal understanding and generation models enable richer human-AI interaction. Yet jointly customizing a character's persona, dialogue style, and visual identity while maintaining output consistency across modalities remains largely unexplo

UN1 model#multimodal#roleplay#character-generationRead on arxiv →
arxivMay 8bullish

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv:2605.05225v1 Announce Type: cross Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-

MIMA2 models#multimodal#efficiency#inferenceRead on arxiv →
arxivMay 1bullish

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

arXiv:2604.28039v1 Announce Type: new Abstract: Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a p

#multimodal#benchmark#scientific-researchRead on arxiv →
arxivApr 29

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

arXiv:2604.23786v1 Announce Type: new Abstract: In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision-Language Models (VLMs), their deployment in clinica

PHQW2 models#fairness#explainability#multimodalRead on arxiv →
arxivApr 27bullish

EuropeMedQA Study Protocol: A Multilingual, Multimodal Medical Examination Dataset for Language Model Evaluation

arXiv:2604.14306v2 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) have demonstrated high proficiency on English-centric medical examinations, their performance often declines when faced with non-English languages and multimodal diagnostic tasks. This study protocol describ

LA1 model#multilingual#medical-ai#benchmarkRead on arxiv →
arxivApr 24bullish

Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection

arXiv:2601.06498v3 Announce Type: replace Abstract: Due to the limited generalization and interpretability of deep learning classifiers, The final vetting of rare celestial object candidates still relies on expert visual inspection--a manually intensive process. In this process, astronomers leverage

SP1 model#astronomy#spectroscopy#multimodalRead on arxiv →
arxivApr 23

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

arXiv:2604.16902v2 Announce Type: replace Abstract: Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this g

#research#language-models#multimodalRead on arxiv →
arxivApr 21bullish

Multilingual Training and Evaluation Resources for Vision-Language Models

arXiv:2604.18347v1 Announce Type: new Abstract: Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for trainin

PIPICO3 models#multilingual#multimodal#benchmarkRead on arxiv →
arxivApr 17

Knowing When Not to Answer: Evaluating Abstention in Multimodal Reasoning Systems

arXiv:2604.14799v1 Announce Type: new Abstract: Effective abstention (EA), recognizing evidence insufficiency and refraining from answering, is critical for reliable multimodal systems. Yet existing evaluation paradigms for vision-language models (VLMs) and multi-agent systems (MAS) assume answerabi

#multimodal#evaluation#abstentionRead on arxiv →
arxivApr 9

ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs

arXiv:2604.06484v1 Announce Type: new Abstract: Cultural values are expressed not only through language but also through visual scenes and everyday social practices. Yet existing evaluations of cultural values in language models are almost entirely text-only, making it unclear whether models can gro

#multimodal#evaluation#cultureRead on arxiv →
arxivApr 8bullish

HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference

arXiv:2604.05887v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have advanced unified reasoning over text, images, and videos, but their inference is hindered by the rapid growth of key-value (KV) caches. Each visual input expands into thousands of tokens, causing caches to

QW1 model#multimodal#compression#optimizationRead on arxiv →
arxivApr 7bullish

PDF Retrieval Augmented Question Answering

arXiv:2506.18027v2 Announce Type: replace Abstract: This paper presents an advancement in Question-Answering (QA) systems using a Retrieval Augmented Generation (RAG) framework to enhance information extraction from PDF files. Recognizing the richness and diversity of data within PDFs--including tex

RELA2 models#question-answering#multimodal#information-extractionRead on arxiv →
HomeModelsNews