arxivJun 1bullish

Fixed Universal Transformers

arXiv:2605.31423v1 Announce Type: new Abstract: We introduce \emph{universal transformers}: fixed transformers that can simulate any transformer in a given class via a suitable input embedding. Analogous to a universal Turing machine, the input embedding encodes a description of the target model whi

TR1 model #machine-learning #transformers #universality Read on arxiv →

arxivMay 16bullish

Krause Synchronization Transformers

arXiv:2602.11534v3 Announce Type: replace-cross Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor

MEQWVI3 models #transformers #attention #efficiency Read on arxiv →

arxivApr 30bullish

Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

arXiv:2604.26762v1 Announce Type: cross Abstract: The Probabilistic Transformer (PT) establishes that the Transformer's self-attention plus its feed-forward block is mathematically equivalent to Mean-Field Variational Inference (MFVI) on a Conditional Random Field (CRF). Under this equivalence the T

PRSP2 models #time-series #probabilistic-models #transformers Read on arxiv →

arxivApr 21bullish

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

arXiv:2604.15356v1 Announce Type: cross Abstract: Recent work on KV cache quantization, culminating in TurboQuant, has approached the Shannon entropy limit for per-vector compression of transformer key-value caches. We observe that this limit applies to a strictly weaker problem than the one that ac

TU1 model #compression #quantization #transformers Read on arxiv →

Fixed Universal Transformers

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

Krause Synchronization Transformers

Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit