Evaluating LLM-Generated ACSL Annotations for Formal Verification
View PDF Abstract:Formal specifications are crucial for building verifiable and dependable software systems, yet generating accurate and verifiable specifications for real-world C programs remains challenging. This paper presents an empirical evaluation of automated ACSL annotation generation strategies for C programs, comparing a rule-based Python script, Frama-C's RTE plugin, and three large language models (DeepSeek-V3.2, GPT-5.2, and OLMo 3.1 32B Instruct). The study focuses on one-shot annotation generation, assessing how these approaches perform when directly applied to verification tasks. Using a filtered subset of the CASP benchmark, we evaluate generated annotations through Frama-C's WP plugin with multiple SMT solvers, analyzing proof success rates, solver timeouts, and internal processing time. Our results show that rule-based approaches remain more reliable for verification success, while LLM-based methods exhibit more variable performance. These findings highlight both the current limitations and the potential of LLMs as complementary tools for automated specification generation. Comments: 12 pages. Formal Techniques for Judicious Programming FTfJP-2026 at ECOOP. Conditionally Accepted. Final Revision Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI) ACM classes: D.2.1; D.2.4; D.2.10; F.4.1; F.4.3 Cite as: arXiv:2602.13851 [cs.SE] (or arXiv:2602.13851v3 [cs.SE] for this version) https://doi.org/10.48550/arXiv.2602.13851 arXiv-issued DOI via DataCite Submission history From: Arshad Beg [view email] [v1] Sat, 14 Feb 2026 19:18:34 UTC (8,105 KB) [v2] Wed, 1 Apr 2026 17:15:52 UTC (8,105 KB) [v3] Tue, 14 Apr 2026 17:18:00 UTC (2,598 KB)
No replies yet. Be first.