MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2605.16865v2 Announce Type: replace Abstract: Supervised fine-tuning (SFT) is widely used to inject new knowledge into language models, but it often degrades pretrained capabilities such as reasoning and general-domain performance. We argue this forgetting arises because fine-tuning targets fr

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Related coverage

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Related coverage