A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2409.06624v4 Announce Type: replace-cross Abstract: Large Language Models (LLM) often need to be Continual Pre-Trained (CPT) to obtain unfamiliar language skills or adapt to new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ra

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Related coverage

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Related coverage