·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆Theker just raised $85M to build the factory robot that doesn’t specialize in anything1h◆Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world1h◆SpaceX officially prices shares at $135 in the largest IPO ever6h◆Our new community investments in Virginia support local jobs and expand energy affordability.6h◆SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift6h◆Amazon’s data centers used 2.5 billion gallons of water last year9h◆Deezer’s new tool can identify AI music from Spotify, Apple Music, and others10h◆Pool’s new app turns your screenshots into something useful11h◆DoorDash’s new AI chatbot lets you order with prompts and photos12h◆Anthropic apologizes for invisible Claude Fable guardrails15h◆Google DeepMind is worried about what happens when millions of agents start to interact15h◆Deezer launches an AI music detector for other streaming services18h◆Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing22h◆MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning22h◆Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!22h◆ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation22h◆Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions22h◆The Impossibility of Eliciting Latent Knowledge22h◆Mapping Scientific Literature with Large Language Models and Topic Modeling22h◆Grounding Computer Use Agents on Human Demonstrations22h◆
Tag

#computer-vision

46 articles tagged #computer-vision

arxiv5d agobullish

Next-Generation Parallel Decoder for LPDR: Architectural Optimization and Class-Balanced GAN-Augmentation

arXiv:2606.05785v1 Announce Type: cross Abstract: Real-Time License Plate Detection and Recognition (LPDR) forms the backbone of modern smart cities. Although the YOLOV5-PDLPR model substantially improved system efficiency through a parallel decoder approach, its performance is still affected by spa

YO1 model#computer-vision#license-plate-recognition#real-time-processingRead on arxiv →
arxivJun 2bullish

Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation

arXiv:2606.02552v1 Announce Type: cross Abstract: Despite advances in depth estimation, flying points remain a persistent failure mode: near object boundaries, depth estimators often predict spurious 3D points in the empty space between foreground and background surfaces. We trace this artifact to a

MD1 model#depth-estimation#computer-vision#image-processingRead on arxiv →
arxivJun 1bullish

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

arXiv:2605.31535v1 Announce Type: cross Abstract: Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We

RA1 model#computer-vision#self-supervised#transformerRead on arxiv →
arxivMay 29bullish

GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection

arXiv:2605.29539v1 Announce Type: cross Abstract: Vision-language foundation models have shown promising zero-shot generalization for Cross-Domain Few-Shot Object Detection (CD-FSOD). However, they face two critical challenges in fine-tuning: insufficient support set utilization due to sparse single

GI1 model#computer-vision#object-detection#few-shot-learningRead on arxiv →
arxivMay 28

Beyond Motion Primitives: Behavioral Activity Recognition from Head-Mounted IMU

arXiv:2605.27464v1 Announce Type: cross Abstract: AR smart glasses need continuous behavioral context to offer proactive assistance, yet their most practical always-on sensor, the head-mounted Inertial Measurement Unit (IMU), detects only motion primitives such as walking or standing. We push beyond

HI1 model#computer-vision#action-recognition#datasetRead on arxiv →
arxivMay 25

Lipschitz Optimization for Formal Verification of Homographies

arXiv:2605.23203v1 Announce Type: cross Abstract: The adoption of vision neural networks in regulated industries requires formal robustness guarantees, especially in safety-critical domains such as healthcare, autonomous vehicles, and aerospace. However, current approaches are confined to incomplete

#computer-vision#safety#verificationRead on arxiv →
arxivMay 22bullish

Letting Trajectories Spread: Quality-Preserving Control for Diverse Flow Matching

arXiv:2510.09060v2 Announce Type: replace Abstract: Flow-based text-to-image models follow deterministic trajectories, making it costly to explore diverse modes under limited sampling budgets. Existing approaches to improving diversity often rely on retraining or degrade image fidelity. To address t

#text-to-image#diversity#computer-visionRead on arxiv →
arxivMay 19bullish

Trajectory-Aware Adaptive Inference in Object Detection Models

arXiv:2605.16397v1 Announce Type: cross Abstract: The increasing integration of sensors in autonomous maritime navigation has led to large-scale multimodal datasets, raising challenges in achieving efficient real-time perception. In such systems, object detection and trajectory perception of nearby

YO1 model#computer-vision#real-time#efficiencyRead on arxiv →
arxivMay 15bullish

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

arXiv:2605.13838v2 Announce Type: replace-cross Abstract: Video-guided 3D animation holds immense potential for content creation, offering intuitive and precise control over dynamic assets. However, practical deployment faces a critical yet frequently overlooked hurdle: the pose misalignment dilemma

REVATR4 models · +1#animation#computer-vision#machine-learningRead on arxiv →
arxivMay 11bullish

NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps

arXiv:2605.06317v2 Announce Type: replace-cross Abstract: Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, the

NA1 model#navigation#computer-vision#path-planningRead on arxiv →
arxivMay 8bullish

EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation

arXiv:2605.05674v1 Announce Type: cross Abstract: Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class sampl

EUOP2 models#computer-vision#out-of-distribution#adapter-trainingRead on arxiv →
arxivMay 8bullish

Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections

arXiv:2605.05402v1 Announce Type: new Abstract: Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study introduces an AI-enabled analytics framework leveraging existing CCTV infrastructure to evaluate the impact of soft interventions, such as temp

DE1 model#transportation#computer-vision#safetyRead on arxiv →
arxivMay 8bullish

Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

arXiv:2602.13310v2 Announce Type: replace-cross Abstract: Existing LLM test-time scaling laws emphasize the emergence of self-reflective behaviors through extended reasoning length. Nevertheless, this vertical scaling strategy often encounters plateaus in exploration as the model becomes locked into

VI1 model#computer-vision#parallel-processing#multimodal-learningRead on arxiv →
arxivMay 8bullish

Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions

arXiv:2605.06058v1 Announce Type: new Abstract: Document Visual Question Answering (DocVQA) requires vision-language models to reason not only about what information in a document is relevant to a question, but also where the answer is grounded on the page. Existing DocVQA models entangle question-r

CO1 model#explainability#document-visual-question-answering#machine-learningRead on arxiv →
arxivMay 5bullish

Anomaly-Preference Image Generation

arXiv:2605.02439v1 Announce Type: cross Abstract: Synthesizing realistic and diverse anomalous samples from limited data is vital for robust model generalization. However, existing methods struggle to reconcile fidelity and diversity, often hampered by distribution misalignment and overfitting, resp

#anomaly-detection#machine-learning#computer-visionRead on arxiv →
arxivMay 4bullish

Being-H0.7: A Latent World-Action Model from Egocentric Videos

arXiv:2605.00078v1 Announce Type: cross Abstract: Visual-Language-Action models (VLAs) have advanced generalist robot control by mapping multimodal observations and language instructions directly to actions, but sparse action supervision often encourages shortcut mappings rather than representations

BE1 model#robotics#computer-vision#machine-learningRead on arxiv →
arxivMay 1

Culture-inspired Multi-modal Color Palette Generation and Colorization: A Chinese Youth Subculture Case

arXiv:2102.05231v1 Announce Type: cross Abstract: Color is an essential component of graphic design, acting not only as a visual factor but also carrying cultural implications. However, existing research on algorithmic color palette generation and colorization largely ignores the cultural aspect. In

#colorization#computer-vision#generative-modelsRead on arxiv →
arxivMay 1

OR-VSKC: Resolving Visual-Semantic Knowledge Conflicts in Operating Rooms with Synthetic Data-Guided Alignment

arXiv:2506.22500v2 Announce Type: replace-cross Abstract: Automated identification of surgical safety risks is critical for improving patient outcomes; however, Multimodal Large Language Models (MLLMs) frequently suffer from Visual-Semantic Knowledge Conflicts (VS-KC), a phenomenon where models poss

#safety#medical#computer-visionRead on arxiv →
arxivApr 30bullish

Delineating Knowledge Boundaries for Honest Large Vision-Language Models

arXiv:2604.26419v1 Announce Type: cross Abstract: Large Vision-Language Models (VLMs) have achieved remarkable multimodal performance yet remain prone to factual hallucinations, particularly in long-tail or specialized domains. Moreover, current models exhibit a weak capacity to refuse queries that

#computer-vision#artificial-intelligence#trustworthinessRead on arxiv →
arxivApr 29

OAMVOS:2nd Report for 5th PVUW MOSE Track

arXiv:2604.22837v1 Announce Type: cross Abstract: SAM-based dense trackers provide strong short-term mask propagation but remain fragile under long occlusion, fast motion, viewpoint change, and distractors. The problem is especially severe for small objects, where a few incorrect memory updates can

DASA2 models#computer-vision#object-tracking#occlusionRead on arxiv →
arxivApr 27bullish

H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers

arXiv:2604.22045v1 Announce Type: cross Abstract: Feature attribution methods explain the predictions of deep neural networks by assigning importance scores to individual input features. However, most existing methods focus solely on marginal effects, overlooking feature interactions, where groups o

VGREDE5 models · +2#computer-vision#interpretability#image-classificationRead on arxiv →
arxivApr 27bullish

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

arXiv:2604.22260v1 Announce Type: cross Abstract: Urban transportation systems face growing safety challenges that require scalable intelligence for emerging smart mobility infrastructures. While recent advances in foundation models and large-scale multimodal datasets have strengthened perception an

UN1 model#open-source#dataset#computer-visionRead on arxiv →
arxivApr 24bullish

FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels

arXiv:2604.20825v1 Announce Type: new Abstract: Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can severely degrade the learning performance. In this paper, we propose FedSIR, a multi-stage frame

FE1 model#federated-learning#noisy-labels#robust-trainingRead on arxiv →
arxivApr 24bullish

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

arXiv:2604.20800v1 Announce Type: cross Abstract: Reconstructing 3D Human-Object Interaction from an RGB image is essential for perceptive systems. Yet, this remains challenging as it requires capturing the subtle physical coupling between the body and objects. While current methods rely on sparse,

LELEVQ3 models#computer-vision#3d-reconstruction#human-object-interactionRead on arxiv →
arxivApr 24

Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

arXiv:2604.20822v1 Announce Type: cross Abstract: The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and operation at global scale. While Earth Observation based offshore wind infrastructure mapp

#earth-observation#offshore-wind#machine-learningRead on arxiv →
arxivApr 21bullish

GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology

arXiv:2604.15495v1 Announce Type: new Abstract: Navigating complex, densely packed environments like retail stores, warehouses, and hospitals poses a significant spatial grounding challenge for humans and embodied AI. In these spaces, dense visual features quickly become stale given the quasi-static

GI1 model#navigation#computer-vision#human-ai-interactionRead on arxiv →
arxivApr 21

SatBLIP: Context Understanding and Feature Identification from Satellite Imagery with Vision-Language Learning

arXiv:2604.14373v2 Announce Type: replace-cross Abstract: Rural environmental risks are shaped by place-based conditions (e.g., housing quality, road access, land-surface patterns), yet standard vulnerability indices are coarse and provide limited insight into risk contexts. We propose SatBLIP, a sa

SABLOP4 models · +1#computer-vision#remote-sensing#vulnerability-indexRead on arxiv →
arxivApr 20bullish

Adapting in the Dark: Efficient and Stable Test-Time Adaptation for Black-Box Models

arXiv:2604.15609v1 Announce Type: new Abstract: Test-Time Adaptation (TTA) for black-box models accessible only via APIs remains a largely unexplored challenge. Existing approaches such as post-hoc output refinement offer limited adaptive capacity, while Zeroth-Order Optimization (ZOO) enables input

BEGOOP6 models · +3#machine-learning#computer-vision#test-time-adaptationRead on arxiv →
arxivApr 18bullish

Edge-preserving noise for diffusion models

arXiv:2410.01540v4 Announce Type: replace-cross Abstract: Classical diffusion models typically rely on isotropic Gaussian noise, treating all regions uniformly and overlooking structural information important for high-quality generation. We introduce an edge-preserving diffusion process that general

#diffusion#computer-vision#machine-learningRead on arxiv →
arxivApr 18

The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform

arXiv:2604.13315v2 Announce Type: replace-cross Abstract: High-resolution data in spatial and temporal contexts is imperative for developing climate resilient cities. Current datasets for monitoring urban parameters are developed primarily using manual inspections, embedded-sensing, remote sensing,

#dataset#computer-vision#urban-planningRead on arxiv →
arxivApr 18bullish

Improving Prostate Gland Segmentation Using Transformer based Architectures

arXiv:2506.14844v2 Announce Type: replace-cross Abstract: Inter reader variability and cross site domain shift challenge the automatic segmentation of prostate anatomy using T2 weighted MRI images. This study investigates whether transformer models can retain precision amid such heterogeneity. We co

UNSW3D3 models#medical-imaging#segmentation#transformer-modelsRead on arxiv →
arxivApr 17bullish

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

arXiv:2604.14113v1 Announce Type: cross Abstract: GUI grounding, which localizes interface elements from screenshots given natural language queries, remains challenging for small icons and dense layouts. Test-time zoom-in methods improve localization by cropping and re-running inference at higher re

UI1 model#computer-vision#localization#uncertainty-quantificationRead on arxiv →
arxivApr 16bullish

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

arXiv:2509.20490v4 Announce Type: replace-cross Abstract: Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods

RA1 model#multiagent#medical-imaging#explainabilityRead on arxiv →
arxivApr 14bullish

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space

arXiv:2604.11539v1 Announce Type: cross Abstract: Human perception of visual similarity is inherently adaptive and subjective, depending on the users' interests and focus. However, most image retrieval systems fail to reflect this flexibility, relying on a fixed, monolithic metric that cannot incorp

VICL2 models#computer-vision#image-retrieval#adaptive-learningRead on arxiv →
arxivApr 13bullish

MixFlow: Mixed Source Distributions Improve Rectified Flows

arXiv:2604.09181v1 Announce Type: cross Abstract: Diffusion models and their variations, such as rectified flows, generate diverse and high-quality images, but they are still hindered by slow iterative sampling caused by the highly curved generative paths they learn. An important cause of high curva

DIREMI3 models#computer-vision#machine-learning#generative-modelsRead on arxiv →
arxivApr 11bullish

TOOLCAD: Exploring Tool-Using Large Language Models in Text-to-CAD Generation with Reinforcement Learning

arXiv:2604.07960v1 Announce Type: cross Abstract: Computer-Aided Design (CAD) is an expert-level task that relies on long-horizon reasoning and coherent modeling actions. Large Language Models (LLMs) have shown remarkable advancements in enabling language agents to tackle real-world tasks. Notably,

LA1 model#cad#language-models#autonomous-systemsRead on arxiv →
arxivApr 11

Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

arXiv:2604.07831v1 Announce Type: cross Abstract: Existing red-teaming studies on GUI agents have important limitations. Adversarial perturbations typically require white-box access, which is unavailable for commercial systems, while prompt injection is increasingly mitigated by stronger safety alig

#security#adversarial#computer-visionRead on arxiv →
arxivApr 10bullish

RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection

arXiv:2505.17732v2 Announce Type: replace-cross Abstract: Accurate, fast, and reliable 3D perception is essential for autonomous driving. Recently, bird's-eye view (BEV)-based perception approaches have emerged as superior alternatives to perspective-based solutions, offering enhanced spatial unders

RQ1 model#autonomous-driving#object-detection#computer-visionRead on arxiv →
arxivApr 10bullish

Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding

arXiv:2508.20765v2 Announce Type: replace-cross Abstract: The automatic understanding of video content is advancing rapidly. Empowered by deeper neural networks and large datasets, machines are increasingly capable of understanding what is concretely visible in video frames, whether it be objects, a

#video-understanding#abstract-concepts#foundation-modelsRead on arxiv →
arxivApr 10bearish

CAAP: Capture-Aware Adversarial Patch Attacks on Palmprint Recognition Models

arXiv:2604.06987v1 Announce Type: cross Abstract: Palmprint recognition is deployed in security-critical applications, including access control and palm-based payment, due to its contactless acquisition and highly discriminative ridge-and-crease textures. However, the robustness of deep palmprint re

#security#adversarial-attacks#computer-visionRead on arxiv →
arxivApr 9bullish

Visual prompting reimagined: The power of the Activation Prompts

arXiv:2604.06440v1 Announce Type: cross Abstract: Visual prompting (VP) has emerged as a popular method to repurpose pretrained vision models for adaptation to downstream tasks. Unlike conventional model fine-tuning techniques, VP introduces a universal perturbation directly into the input data to f

#computer-vision#fine-tuning#machine-learningRead on arxiv →
arxivApr 9bearish

Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments

arXiv:2604.07254v1 Announce Type: cross Abstract: Deep neural networks can predict human judgments, but this does not imply that they rely on human-like information or reveal the cues underlying those judgments. Prior work has addressed this issue using attribution heatmaps, but their explanatory va

VGEFBA3 models#computer-vision#machine-learning#explanabilityRead on arxiv →
arxivApr 8bullish

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

arXiv:2604.06156v1 Announce Type: cross Abstract: MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorporating chain-of-thought reasoning into embedding learning introduces two fundamental challenges. Fir

MM1 model#multimodal-embedding#reasoning#computer-visionRead on arxiv →
arxivApr 7

ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos

arXiv:2512.03666v2 Announce Type: replace-cross Abstract: A core capability towards general embodied intelligence lies in localizing task-relevant objects from an egocentric perspective, formulated as Spatio-Temporal Video Grounding (STVG). Despite recent progress, existing STVG studies remain large

#computer-vision#benchmark#embodied-intelligenceRead on arxiv →
arxivApr 6bearish

Multimodal Language Models Cannot Spot Spatial Inconsistencies

arXiv:2604.00799v2 Announce Type: replace-cross Abstract: Spatial consistency is a fundamental property of the visual world and a key requirement for models that aim to understand physical reality. Despite recent advances, multimodal large language models (MLLMs) often struggle to reason about 3D ge

#computer-vision#machine-learning#evaluationRead on arxiv →
arxivApr 6bullish

Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding

arXiv:2604.02546v1 Announce Type: cross Abstract: Pretraining 3D encoders by aligning with Contrastive Language Image Pretraining (CLIP) has emerged as a promising direction to learn generalizable representations for 3D scene understanding. In this paper, we propose UniScene3D, a transformer-based e

OPUN2 models#computer-vision#3d-scene-understanding#transformerRead on arxiv →
HomeModelsNews