arxiv
PublishedApril 16, 2026 at 4:00 AM
—neutral
Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss
Publisher summary· verbatim
arXiv:2604.12911v1 Announce Type: cross Abstract: Multilingual benchmarks guide the development of frontier models. Yet multilingual evaluations reported by frontier models are structured similar to popular reasoning and knowledge benchmarks, but across many languages. We show such benchmarks, and c
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables10harxivConsequentialist Objectives and Catastrophe10harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms10harxivA general optimization solver based on OP-to-MaxSAT reduction10hOriginally published on arxiv ↗