Abstract

Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea is to make the released models' weights difficult to fine-tune on certain harmful applications, hence the name immunized. Recent work on model immunization focuses on the single-concept setting. However, in real-world situations, models need to be immunized against multiple concepts. To address this gap, we propose an immunization algorithm that, simultaneously, learns a single “difficult initialization” for adaptation methods over a set of concepts. We achieve this by incorporating a differentiable merging layer that combines a set of model weights adapted over multiple concepts. In our experiments, we demonstrate the effectiveness of multi-concept immunization by generalizing prior work's experiment setup of re-learning and personalization adaptation to multiple concepts.

Method

MIMA is formulated as a bi-level optimization program. The objective is defined as \[ \underbrace{\max_{\theta \in \mathcal{S}^u} \sum_{n=1}^{|C|} L(\mathbf{x}^u_{[n]}, \mathbf{c}_{[n]}; \text{Merge}(\{\theta'_{[n]}\}))}_{\text{upper-level task}}, \] \[ \text{s.t.} \quad \underbrace{\theta'_{[n]} \triangleq \arg\min_{\theta \in \mathcal{S}^l} L(\mathbf{x}^l_{[n]}, \mathbf{c}_{[n]}; \theta) \quad \forall n}_{\text{multiple lower-level tasks}}. \] For the lower-level, we unroll loss \(L\) for the copied weights of each concept. Next, we combine the individual weights \(\theta'_{[n]}\) via our proposed Merge layer defined in the Equation below. For the upper-level, we maximize the diffusion loss \(L\) with respect to the parameters \(\theta\) by backpropagating through \(\theta'\). \[ \theta' \triangleq \text{Merge}\left(\{ \theta'_{[n]} \}_{n=1}^N \right) \triangleq \begin{cases} \varphi^\star\left(\{\theta'_{[n]} \mid \in \mathcal{W} \}\right), & \text{if} \ \theta'_{[n]} \in \mathcal{W} \\ \frac{1}{N} \sum_{n=1}^N \theta'_{[n]}, & \text{if} \ \theta'_{[n]} \notin \mathcal{W} \end{cases} \]

Citation


    @inproceedings{zheng2025mima, 

         title={Multi-concept Model Immunization through Differentiable Model Merging}, 

         author={Zheng, Amber Yijia and Yeh, Raymond A},

         booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},

         year={2025}

}