Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2604.21724v2 Announce Type: replace Abstract: Large token-indexed lookup tables provide a compute-decoupled scaling path, but their practical gains are often limited by poor parameter efficiency and rapid memory growth. We attribute these limitations to Zipfian under-training of the long tail,

Discussion

No replies yet. Be first.

Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

Related coverage

Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

Related coverage