arxivJul 18bullish

Reachability-Aware Pretraining for Efficient Target-Oriented Path Exploration in Temporal Knowledge Graph Reasoning

arXiv:2607.14886v1 Announce Type: new Abstract: Temporal Knowledge Graph (TKG) reasoning under the extrapolation setting focuses on forecasting future time-stamped events (facts) from historical data in a temporal knowledge graph. Existing approaches, reinforcement learning (RL)-based multi-hop reas

RA1 model #temporal-knowledge-graph #reinforcement-learning #pretraining Read on arxiv →

arxivJul 18bullish

Function-Aware Fill-in-the-Middle as Mid-Training for Coding Agent Foundation Models

arXiv:2607.12463v2 Announce Type: replace Abstract: Coding agents must integrate external tool returns into ongoing reasoning - a capability that standard left-to-right pretraining on code exposes only in its forward direction. We observe that the action-observation-continuation loop of a coding age

QWQW2 models #pretraining #self-supervised #mid-training Read on arxiv →

arxivJun 29

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

arXiv:2604.17633v2 Announce Type: replace Abstract: Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges-

MU1 model #multilingual #pretraining #language-models Read on arxiv →

arxivJun 19

Characterizing Narrative Content in Web-scale LLM Pretraining Data

arXiv:2606.19468v1 Announce Type: new Abstract: The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of narrative features in Dolma, a 3-trillion-token ope

NAFADO3 models #narrative-analysis #pretraining #llm Read on arxiv →

arxivJun 12

Small Experiments, Cheaper Decisions: A Case Study in Staged Promotion for Micro-Pretraining

arXiv:2606.11387v1 Announce Type: cross Abstract: Short pretraining runs can reduce experimental cost, but they can also over-promote configurations that only look strong at tiny budgets. We study an auditable staged-promotion protocol for a fixed micro-pretraining runner on two heterogeneous host b

#pretraining #optimization #machine-learning Read on arxiv →

arxivMay 22bullish

Billion-Scale Graph Foundation Models

arXiv:2602.04768v2 Announce Type: replace Abstract: Graph-structured data underpins many critical applications. While foundation models have transformed language and vision via large-scale pretraining and lightweight adaptation, extending this paradigm to general, real-world graphs is challenging. I

GR1 model #graph-learning #foundation-models #pretraining Read on arxiv →

arxivApr 16bullish

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

arXiv:2604.13313v1 Announce Type: new Abstract: Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word order and attribute binding. This limitation arises from a scarcity of informative samples needed to d

COSL2 models #machine-learning #vision-language #compositional-reasoning Read on arxiv →

arxivApr 6bullish

Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding

arXiv:2604.02546v1 Announce Type: cross Abstract: Pretraining 3D encoders by aligning with Contrastive Language Image Pretraining (CLIP) has emerged as a promising direction to learn generalizable representations for 3D scene understanding. In this paper, we propose UniScene3D, a transformer-based e

OPUN2 models #computer-vision #3d-scene-understanding #transformer Read on arxiv →