Fairness Evaluation and Inference Level Mitigation in LLMs

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2510.18914v4 Announce Type: replace-cross Abstract: Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialog

Discussion

No replies yet. Be first.

Originally published on arxiv ↗