arxiv
PublishedApril 13, 2026 at 4:00 AM
Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder
Publisher summary· verbatim
arXiv:2604.09389v1 Announce Type: cross Abstract: Training Transformer language models is expensive, as performance typically improves with increasing dataset size and computational budget. Although scaling laws describe this trend at large scale, their implications in controlled, smaller-scale sett
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables10harxivConsequentialist Objectives and Catastrophe10harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms10harxivA general optimization solver based on OP-to-MaxSAT reduction10hOriginally published on arxiv ↗