Toward Robust and Efficient ML-Based GPU Caching for Modern Inference

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2509.20979v2 Announce Type: replace Abstract: In modern GPU inference, cache efficiency remains a major bottleneck, and heuristic policies such as \textsc{LRU} can perform far worse than the offline optimum. Existing learning-based caching systems improve hit rates mainly through predictor des

Discussion

No replies yet. Be first.

Toward Robust and Efficient ML-Based GPU Caching for Modern Inference

Related coverage

Toward Robust and Efficient ML-Based GPU Caching for Modern Inference

Related coverage