Propensity Inference: Environmental Contributors to LLM Behaviour

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2604.21098v1 Announce Type: new Abstract: Motivated by loss of control risks from misaligned AI systems, we develop and apply methods for measuring language models' propensity for unsanctioned behaviour. We contribute three methodological improvements: analysing effects of changes to environme

Discussion

No replies yet. Be first.

Propensity Inference: Environmental Contributors to LLM Behaviour

Related coverage