arxivJun 1bullish
arXiv:2605.31535v1 Announce Type: cross Abstract: Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We
arxivMay 14bullish
arXiv:2605.12879v1 Announce Type: new Abstract: Doubly-stochastic attention has emerged as a transport-based alternative to row-softmax attention, with recent Transformer variants using it to reduce attention sinks and rank collapse while improving performance. In this family, the standard approach
arxivApr 6bullish
arXiv:2604.02546v1 Announce Type: cross Abstract: Pretraining 3D encoders by aligning with Contrastive Language Image Pretraining (CLIP) has emerged as a promising direction to learn generalizable representations for 3D scene understanding. In this paper, we propose UniScene3D, a transformer-based e