arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
Fairness Evaluation and Inference Level Mitigation in LLMs
Publisher summary· verbatim
arXiv:2510.18914v4 Announce Type: replace-cross Abstract: Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialog
Discussion
No replies yet. Be first.
Originally published on arxiv ↗