Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach
View PDF HTML (experimental) Abstract:Digital forensic investigations increasingly rely on heterogeneous evidence such as images, scanned documents, and contextual reports. These artifacts may contain explicit or implicit expressions of harm, hate, threat, violence, or intimidation, yet existing automated approaches often assume clean text input or apply vision models without forensic justification. This paper presents a case-driven multimodal approach for hate and threat detection in forensic analysis. The proposed framework explicitly determines the presence and source of textual evidence, distinguishing between embedded text, associated contextual text, and image-only evidence. Based on the identified evidence configuration, the framework selectively applies text analysis, multimodal fusion, or image-only semantic reasoning using vision language models with vision transformer backbones (ViT). By conditioning inference on evidence availability, the approach mirrors forensic decision-making, improves evidentiary traceability, and avoids unjustified modality assumptions. Experimental evaluation on forensic-style image evidence demonstrates consistent and interpretable behavior across heterogeneous evidence scenarios. Comments: 8 pages, 4 figures Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2604.08609 [cs.CV] (or arXiv:2604.08609v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2604.08609 arXiv-issued DOI via DataCite Submission history From: Ponkoj Shill [view email] [v1] Wed, 8 Apr 2026 21:50:02 UTC (1,116 KB)
No replies yet. Be first.