Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder - Databubble