Multi-concept Model Immunization through Differentiable Model Merging
Abstract
Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea is to make the released models' weights difficult to fine-tune on certain harmful applications, hence the name immunized.
Recent work on model immunization focuses on the single-concept setting. However, in real-world situations, models need to be immunized against multiple concepts. To address this gap, we propose an immunization algorithm that, simultaneously, learns a single “difficult initialization” for adaptation methods over a set of concepts.
We achieve this by incorporating a differentiable merging layer that combines a set of model weights adapted over multiple concepts. In our experiments, we demonstrate the effectiveness of multi-concept immunization by generalizing prior work's experiment setup of re-learning and personalization adaptation to multiple concepts.
Method
MIMA is formulated as a bi-level optimization program. The objective is defined as
\[
\underbrace{\max_{\theta \in \mathcal{S}^u} \sum_{n=1}^{|C|} L(\mathbf{x}^u_{[n]}, \mathbf{c}_{[n]}; \text{Merge}(\{\theta'_{[n]}\}))}_{\text{upper-level task}},
\]
\[
\text{s.t.} \quad
\underbrace{\theta'_{[n]} \triangleq \arg\min_{\theta \in \mathcal{S}^l} L(\mathbf{x}^l_{[n]}, \mathbf{c}_{[n]}; \theta) \quad \forall n}_{\text{multiple lower-level tasks}}.
\]
For the lower-level, we unroll loss \(L\) for the copied weights of each concept.
Next, we combine the individual weights \(\theta'_{[n]}\) via our proposed
Merge
layer defined in
the Equation below.
For the upper-level, we maximize the diffusion loss
\(L\) with respect to the parameters \(\theta\) by backpropagating through \(\theta'\).
\[
\theta' \triangleq \text{Merge}\left(\{ \theta'_{[n]} \}_{n=1}^N \right)
\triangleq
\begin{cases}
\varphi^\star\left(\{\theta'_{[n]} \mid \in \mathcal{W} \}\right), & \text{if} \ \theta'_{[n]} \in \mathcal{W} \\
\frac{1}{N} \sum_{n=1}^N \theta'_{[n]}, & \text{if} \ \theta'_{[n]} \notin \mathcal{W}
\end{cases}
\]
Citation
@inproceedings{zheng2025mima,
title={Multi-concept Model Immunization through Differentiable Model Merging},
author={Zheng, Amber Yijia and Yeh, Raymond A},
booktitle={Association for the Advancement of Artificial Intelligence (AAAI)},
year={2025}
}