Multi-concept Model Immunization through Differentiable Model Merging
Amber Yijia Zheng
Raymond A. Yeh
Purdue University
AAAI 2025

Code Paper

Abstract

Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea is to make the released models' weights difficult to fine-tune on certain harmful applications, hence the name immunized. Recent work on model immunization focuses on the single-concept setting. However, in real-world situations, models need to be immunized against multiple concepts. To address this gap, we propose an immunization algorithm that, simultaneously, learns a single “difficult initialization” for adaptation methods over a set of concepts. We achieve this by incorporating a differentiable merging layer that combines a set of model weights adapted over multiple concepts. In our experiments, we demonstrate the effectiveness of multi-concept immunization by generalizing prior work's experiment setup of re-learning and personalization adaptation to multiple concepts.

Method

MIMA is formulated as a bi-level optimization program. The objective is defined as \[ \underbrace{\max_{\theta \in \mathcal{S}^u} \sum_{n=1}^{|C|} L(\mathbf{x}^u_{[n]}, \mathbf{c}_{[n]}; \text{Merge}(\{\theta'_{[n]}\}))}_{\text{upper-level task}}, \] \[ \text{s.t.} \quad \underbrace{\theta'_{[n]} \triangleq \arg\min_{\theta \in \mathcal{S}^l} L(\mathbf{x}^l_{[n]}, \mathbf{c}_{[n]}; \theta) \quad \forall n}_{\text{multiple lower-level tasks}}. \] For the lower-level, we unroll loss \(L\) for the copied weights of each concept. Next, we combine the individual weights \(\theta'_{[n]}\) via our proposed Merge layer defined in the Equation below. For the upper-level, we maximize the diffusion loss \(L\) with respect to the parameters \(\theta\) by backpropagating through \(\theta'\). \[ \theta' \triangleq \text{Merge}\left(\{ \theta'_{[n]} \}_{n=1}^N \right) \triangleq \begin{cases} \varphi^\star\left(\{\theta'_{[n]} \mid \in \mathcal{W} \}\right), & \text{if} \ \theta'_{[n]} \in \mathcal{W} \\ \frac{1}{N} \sum_{n=1}^N \theta'_{[n]}, & \text{if} \ \theta'_{[n]} \notin \mathcal{W} \end{cases} \]

Citation

@inproceedings{zheng2025mima,
 title={Multi-concept Model Immunization through Differentiable Model Merging},
 author={Zheng, Amber Yijia and Yeh, Raymond A},
 booktitle={Association for the Advancement of Artificial Intelligence (AAAI)},
 year={2025}
}