Zhuangdi Zhu
Zhuangdi Zhu
Home
Professional Activities
Publications
Services
Team
Experiences
Miscellaneous
Teaching
Light
Dark
Automatic
Alignment
CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment
Pretrained knowledge memorized in LLMs raises critical concerns over safety and privacy, which has motivated LLM Unlearning as a technique for selectively removing the influences of undesirable knowledge. Existing approaches, rooted in Gradient …
Cite
×