Multilingual Cognitive Impairment Detection in the Era of Foundation Models
View PDF HTML (experimental) Abstract:We evaluate cognitive impairment (CI) classification from transcripts of speech in English, Slovene, and Korean. We compare zero-shot large language models (LLMs) used as direct classifiers under three input settings -- transcript-only, linguistic-features-only, and combined -- with supervised tabular approaches trained under a leave-one-out protocol. The tabular models operate on engineered linguistic features, transcript embeddings, and early or late fusion of both modalities. Across languages, zero-shot LLMs provide competitive no-training baselines, but supervised tabular models generally perform better, particularly when engineered linguistic features are included and combined with embeddings. Few-shot experiments focusing on embeddings indicate that the value of limited supervision is language-dependent, with some languages benefiting substantially from additional labelled examples while others remain constrained without richer feature representations. Overall, the results suggest that, in small-data CI detection, structured linguistic signals and simple fusion-based classifiers remain strong and reliable signals. Comments: Accepted as an oral at the RAPID workshop @ LREC 2026' Subjects: Computation and Language (cs.CL) Cite as: arXiv:2604.06758 [cs.CL] (or arXiv:2604.06758v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2604.06758 arXiv-issued DOI via DataCite (pending registration) Submission history From: Boshko Koloski [view email] [v1] Wed, 8 Apr 2026 07:22:43 UTC (47 KB)
No replies yet. Be first.