Advancing AI Research Assistants with Expert-Involved Learning
Authors:Tianyu Liu, Simeng Han, Hanchen Wang, Xiao Luo, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Yinsheng Lu, Xinyu Wei, Qinzhe Xing, Antonia Panescu, Mengbo Wang, Vibha Annaswamy, Alicia Sanchez, Jack Cloherty, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao View PDF HTML (experimental) Abstract:Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear. We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework that pairs a curated multimodal biomedical corpus with expert-vetted tasks to probe two capabilities: full-length article summarization and fine-grained figure interpretation. Using uniform protocols and blinded PhD-level evaluation, we find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning. We later observe that prompt engineering and lightweight fine-tuning substantially improve textual coverage, and a compute-scaled inference strategy enhances visual question answering. We build an ARIEL agent that integrates textual and visual cues, and we show it can propose testable mechanistic hypotheses. ARIEL delineates current strengths and limitations of foundation models, and provides a reproducible platform for advancing trustworthy AI in biomedicine. Comments: 43 pages, 7 figures Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR) Cite as: arXiv:2505.04638 [cs.AI] (or arXiv:2505.04638v5 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2505.04638 arXiv-issued DOI via DataCite Submission history From: Tianyu Liu [view email] [v1] Sat, 3 May 2025 14:21:48 UTC (4,425 KB) [v2] Wed, 8 Oct 2025 23:16:32 UTC (5,433 KB) [v3] Wed, 10 Dec 2025 21:12:10 UTC (5,425 KB) [v4] Tue, 3 Feb 2026 15:12:40 UTC (5,425 KB) [v5] Mon, 6 Apr 2026 18:12:55 UTC (5,812 KB)
No replies yet. Be first.