Multi-concept Model Immunization through Differentiable Model Merging
    
    
    
        
    
    
    
    
    Abstract
            Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea is to make the released models' weights difficult to fine-tune on certain harmful applications, hence the name immunized. 
            Recent work on model immunization focuses on the single-concept setting. However, in real-world situations, models need to be immunized against multiple concepts. To address this gap, we propose an immunization algorithm that, simultaneously, learns a single “difficult initialization” for adaptation methods over a set of concepts. 
            We achieve this by incorporating a differentiable merging layer that combines a set of model weights adapted over multiple concepts. In our experiments, we demonstrate the effectiveness of multi-concept immunization by generalizing prior work's experiment setup of re-learning and personalization adaptation to multiple concepts.
    
    
    
    Method
      
    
    MIMA is formulated as a bi-level optimization program. The objective is defined as
        \[
        \underbrace{\max_{\theta \in \mathcal{S}^u} \sum_{n=1}^{|C|} L(\mathbf{x}^u_{[n]}, \mathbf{c}_{[n]}; \text{Merge}(\{\theta'_{[n]}\}))}_{\text{upper-level task}},
        \]
        \[
        \text{s.t.} \quad 
        \underbrace{\theta'_{[n]} \triangleq \arg\min_{\theta \in \mathcal{S}^l} L(\mathbf{x}^l_{[n]}, \mathbf{c}_{[n]}; \theta) \quad \forall n}_{\text{multiple lower-level tasks}}.
        \]
        For the lower-level, we unroll loss \(L\) for the copied weights of each concept. 
        Next, we combine the individual weights \(\theta'_{[n]}\) via our proposed 
        Merge layer defined in 
        the Equation below.
        For the upper-level, we maximize the diffusion loss 
        \(L\) with respect to the parameters \(\theta\) by backpropagating through \(\theta'\).
        \[
        \theta' \triangleq \text{Merge}\left(\{ \theta'_{[n]} \}_{n=1}^N \right) 
        \triangleq 
        \begin{cases} 
        \varphi^\star\left(\{\theta'_{[n]} \mid \in \mathcal{W} \}\right), & \text{if} \ \theta'_{[n]} \in \mathcal{W} \\ 
        \frac{1}{N} \sum_{n=1}^N \theta'_{[n]}, & \text{if} \ \theta'_{[n]} \notin \mathcal{W}
        \end{cases}
        \]
    
    
    
    
Citation
    @inproceedings{zheng2025mima, 
         title={Multi-concept Model Immunization through Differentiable Model Merging}, 
         author={Zheng, Amber Yijia and Yeh, Raymond A},
         booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
         year={2025}
}