arxiv
PublishedApril 20, 2026 at 4:00 AM
▲bullish
C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment
Publisher summary· verbatim
arXiv:2604.15675v1 Announce Type: new Abstract: Achieving cultural alignment in Large Language Models (LLMs) increasingly depends on synthetic data generation. For such synthesis, the most vital initial step is seed curation; however, current methods lack quantifiable standards for selecting these s
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivConsequentialist Objectives and Catastrophe8harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms8harxivReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation8harxivA Probabilistic Framework for Hierarchical Goal Recognition8hOriginally published on arxiv ↗