Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation
View PDF HTML (experimental) Abstract:Stories are key to transmitting values across cultures, but their interpretation varies across linguistic and cultural contexts. Thus, we introduce multilingual story moral generation as a novel culturally grounded evaluation task. Using a new dataset of human-written story morals collected across 14 language-culture pairs, we compare model outputs with human interpretations via semantic similarity, a human preference survey, and value categorization. We show that frontier models such as GPT-4o and Gemini generate story morals that are semantically similar to human responses and preferred by human evaluators. However, their outputs exhibit markedly less cross-linguistic variation and concentrate on a narrower set of widely shared values. These findings suggest that while contemporary models can approximate central tendencies of human moral interpretation, they struggle to reproduce the diversity that characterizes human narrative understanding. By framing narrative interpretation as an evaluative task, this work introduces a new approach to studying cultural alignment in language models beyond static benchmarks or knowledge-based tests. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2604.08797 [cs.CL] (or arXiv:2604.08797v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2604.08797 arXiv-issued DOI via DataCite (pending registration) Submission history From: Sophie Wu [view email] [v1] Thu, 9 Apr 2026 22:13:24 UTC (4,131 KB)
No replies yet. Be first.