SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2604.07663v1 Announce Type: new Abstract: The AdamW optimizer, while standard for LLM pretraining, is a critical memory bottleneck, consuming optimizer states equivalent to twice the model's size. Although light-state optimizers like SinkGD attempt to address this issue, we identify the embedd

Discussion

No replies yet. Be first.

SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization

Related coverage