arxivJul 10bullish

Efficient Long-Horizon Learning for Learned Optimization

arXiv:2607.06772v2 Announce Type: replace Abstract: Learned optimization aims to improve upon hand-designed optimizers (e.g., Adam and Muon) by meta-learning small neural network optimizers over a distribution of tasks. While recent work has greatly advanced the architectural design and inductive bi

ADMUGP6 models · +3 #optimization #meta-learning

arxivJun 10bullish

K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

arXiv:2606.10820v1 Announce Type: cross Abstract: Autoregressive (AR) language modeling is the dominant paradigm for text generation, yet its sequential token-by-token decoding makes inference memory-bound and inefficient. Existing acceleration approaches, such as speculative decoding and diffusion

K-TR2 models #language-modeling #acceleration #inference Read on arxiv →

arxivMay 4bullish

NRGPT: An Energy-based Alternative for GPT

arXiv:2512.16762v3 Announce Type: replace Abstract: Generative Pre-trained Transformer (GPT) architectures are the most popular design for language modeling. Energy-based modeling is a different paradigm that views inference as a dynamical process operating on an energy landscape. We propose a minim

GPEN2 models #language-modeling #energy-based-modeling #machine-learning Read on arxiv →

arxivApr 24bullish

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

arXiv:2305.01626v4 Announce Type: replace-cross Abstract: Computational models of syntax are predominantly text-based. Here we propose that the most basic first step in the evolution of syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous

CIFICN3 models #speech-processing #neural-networks #language-modeling Read on arxiv →

arxivApr 16

Guided Transfer Learning for Discrete Diffusion Models

arXiv:2512.10877v4 Announce Type: replace Abstract: Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically depends on large training datasets, challenging th

#diffusion-models #transfer-learning #language-modeling Read on arxiv →