Adam1 [Review] Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793 Adam-mini: Use Fewer Learning Rates To Gain MoreWe propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90% of tharxiv.orghttps://github.com/zyushun/Adam-mini GitHub - zyushu.. 2024. 7. 13. 이전 1 다음