Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2604.09389v1 Announce Type: cross Abstract: Training Transformer language models is expensive, as performance typically improves with increasing dataset size and computational budget. Although scaling laws describe this trend at large scale, their implications in controlled, smaller-scale sett

Discussion

No replies yet. Be first.

Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder

Related coverage