arxivApril 24, 2026 at 4:00 AM2 min readneutral

DMAP: A Distribution Map for Text

View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are a powerful tool for statistical text analysis, with derived sequences of next-token probability distributions offering a wealth of information. Extracting this signal typically relies on metrics such as perplexity, which do not adequately account for context; how one should interpret a given next-token probability is dependent on the number of reasonable choices encoded by the shape of the conditional distribution. In this work, we present DMAP, a mathematically grounded method that maps a text, via a language model, to a set of samples in the unit interval that jointly encode rank and probability information. This representation enables efficient, model-agnostic analysis and supports a range of applications. We illustrate its utility through three case studies: (i) validation of generation parameters to ensure data integrity, (ii) examining the role of probability curvature in machine-generated text detection, and (iii) a forensic analysis revealing statistical fingerprints left in downstream models that have been subject to post-training on synthetic data. Our results demonstrate that DMAP offers a unified statistical view of text that is simple to compute on consumer hardware, widely applicable, and provides a foundation for further research into text analysis with LLMs. Comments: ICLR 2026 Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG) Cite as: arXiv:2602.11871 [cs.CL] (or arXiv:2602.11871v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2602.11871 arXiv-issued DOI via DataCite Submission history From: Stuart Burrell Dr [view email] [v1] Thu, 12 Feb 2026 12:21:24 UTC (4,298 KB) [v2] Thu, 23 Apr 2026 17:21:56 UTC (4,368 KB)

Read original article ↗

No replies yet. Be first.

arxiv1d ago

DMAP: A Distribution Map for Text

Related Articles

Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

Three reasons why DeepSeek’s new model matters